[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.
[ https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995112#comment-14995112 ] Lei (Eddy) Xu commented on HDFS-9267: - Will fix these tests in the next patch. > TestDiskError should get stored replicas through FsDatasetTestUtils. > > > Key: HDFS-9267 > URL: https://issues.apache.org/jira/browse/HDFS-9267 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, > HDFS-9267.02.patch, HDFS-9267.03.patch > > > {{TestDiskError#testReplicationError}} scans local directories to verify > blocks and metadata files, which leaks the details of {{FsDataset}} > implementation. > This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
[ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-9381: -- Description: Currently I noticed that we are just returning null if block already exists in pendingReplications in replication flow for striped blocks. {code} if (block.isStriped()) { if (pendingNum > 0) { // Wait the previous recovery to finish. return null; } {code} Here if we just return null and if neededReplications contains only fewer blocks(basically by default if less than numliveNodes*2), then same blocks can be picked again from neededReplications from next loop as we are not removing element from neededReplications. Since this replication process need to take fsnamesystmem lock and do, we may spend some time unnecessarily in every loop. So my suggestion/improvement is: Instead of just returning null, how about incrementing pendingReplications for this block and remove from neededReplications? and also another point to consider here is, to add into pendingReplications, generally we need target and it is nothing but to which node we issued replication command. Later when after replication success and DN reported it, block will be removed from pendingReplications from NN addBlock. So since this is newly picked block from neededReplications, we would not have selected target yet. So which target to be passed to pendingReplications if we add this block? One Option I am thinking is, how about just passing srcNode itself as target for this special condition? So, anyway if the block is really missed, srcNode will not report it. So this block will not be removed from pending replications, so that when it is timed out, it will be considered for replication again and that time it will find actual target to replicate while processing as part of regular replication flow. was: Currently I noticed that we are just returning null if block already exists in pendingReplications in replication flow for striped blocks. {code} if (block.isStriped()) { if (pendingNum > 0) { // Wait the previous recovery to finish. return null; } {code} Here if neededReplications contains only fewer blocks(basically by default if less than numliveNodes*2), then same blocks can be picked again from neededReplications if we just return null as we are not removing element from neededReplications. Since this replication process need to take fsnamesystmem lock and do, we may spend some time unnecessarily in every loop. So my suggestion/improvement is: Instead of just returning null, how about incrementing pendingReplications for this block and remove from neededReplications? and also another point to consider here is, to add into pendingReplications, generally we need target and it is nothing to which node we issued replication command. Later when after replication success and DN reported it, block will be removed from pendingReplications from NN addBlock. So since this is newly picked block from neededReplications, we would not have selected target yet. So which target to be passed to pendingReplications if we add this block.. One Option I am thinking is, how about just passing srcNode itself as target for this special condition? So, anyway if block is really missed, srcNode anyway will not report it. So this block will not be removed from pending replications, so that when it timeout, it will be considered for replication and that time it will find actual target to replicate. So > When same block came for replication for Striped mode, we can move that block > to PendingReplications > > > Key: HDFS-9381 > URL: https://issues.apache.org/jira/browse/HDFS-9381 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > > Currently I noticed that we are just returning null if block already exists > in pendingReplications in replication flow for striped blocks. > {code} > if (block.isStriped()) { > if (pendingNum > 0) { > // Wait the previous recovery to finish. > return null; > } > {code} > Here if we just return null and if neededReplications contains only fewer > blocks(basically by default if less than numliveNodes*2), then same blocks > can be picked again from neededReplications from next loop as we are not > removing element from neededReplications. Since this replication process need > to take fsnamesystmem lock and do, we may spend some time unnecessarily in > every loop. > So my suggestion/improvement is: > Instead of just returning null, how about incrementing pending
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995073#comment-14995073 ] Hudson commented on HDFS-9379: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1372 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1372/]) HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 2801b42a7e178ad6a0e6b0f29f22f3571969c530) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.
[ https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995068#comment-14995068 ] Hadoop QA commented on HDFS-9267: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 12s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 129m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 182m 48s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 27s {color} | {color:red} Patch generated 57 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 337m 19s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice$BlockPoolSliceReplicaIterator$DirIterator.next() can't throw NoSuchElementException At BlockPoolSlice.java:At BlockPoolSlice.java:[line 456] | | JDK v1.8.0_60 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.TestReplication | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.TestDFSStripedOutputStream | | | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog | | | hadoop.hdfs.server.datanode.T
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995062#comment-14995062 ] Hudson commented on HDFS-9379: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #649 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/649/]) HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 2801b42a7e178ad6a0e6b0f29f22f3571969c530) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995047#comment-14995047 ] Hudson commented on HDFS-9379: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2579 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2579/]) HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 2801b42a7e178ad6a0e6b0f29f22f3571969c530) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995038#comment-14995038 ] Hadoop QA commented on HDFS-2261: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 26 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 22s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 4s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 15s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 56s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 12s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 28s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 29s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 58s {color} | {color:red} hadoop-common in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 188m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag | | | hadoop.ha.TestZKFailoverController | | | hadoop.metrics2.impl.TestGangliaMetrics | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | JDK v1.7.0_79 Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag | | | hadoop.hdfs.TestDFSUpgradeFromImage | | |
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995032#comment-14995032 ] Hudson commented on HDFS-9379: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #639 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/639/]) HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 2801b42a7e178ad6a0e6b0f29f22f3571969c530) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995014#comment-14995014 ] Hudson commented on HDFS-9236: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995016#comment-14995016 ] Hudson commented on HDFS-9318: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995015#comment-14995015 ] Hudson commented on HDFS-6481: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.3 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOu
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994989#comment-14994989 ] Hudson commented on HDFS-9379: -- FAILURE: Integrated in Hadoop-trunk-Commit #8770 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8770/]) HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 2801b42a7e178ad6a0e6b0f29f22f3571969c530) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994975#comment-14994975 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.3 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994974#comment-14994974 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994976#comment-14994976 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9379: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: (was: 2.8.0) Status: Resolved (was: Patch Available) Committed for 2.8.0. Thanks for the contribution [~liuml07]. > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994954#comment-14994954 ] Arpit Agarwal commented on HDFS-9379: - Thanks for confirming you tested it manually. I will commit this shortly. > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy
[ https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8905: Status: Patch Available (was: Open) > Refactor DFSInputStream#ReaderStrategy > -- > > Key: HDFS-8905 > URL: https://issues.apache.org/jira/browse/HDFS-8905 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch > > > DFSInputStream#ReaderStrategy family don't look very good. This refactors a > little bit to make them make more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy
[ https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8905: Status: Open (was: Patch Available) > Refactor DFSInputStream#ReaderStrategy > -- > > Key: HDFS-8905 > URL: https://issues.apache.org/jira/browse/HDFS-8905 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch > > > DFSInputStream#ReaderStrategy family don't look very good. This refactors a > little bit to make them make more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy
[ https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8905: Fix Version/s: (was: HDFS-7285) > Refactor DFSInputStream#ReaderStrategy > -- > > Key: HDFS-8905 > URL: https://issues.apache.org/jira/browse/HDFS-8905 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch > > > DFSInputStream#ReaderStrategy family don't look very good. This refactors a > little bit to make them make more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9398) Make ByteArraryManager log message in one-line format
[ https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994868#comment-14994868 ] Hadoop QA commented on HDFS-9398: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 7s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 20s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-07 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771143/HDFS-9398.000.patch | | JIRA Issue | HDFS-9398 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 0a08e6ac7939 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-ee5baeb/precommit/personality/hadoop.sh | | git revision | trunk / bf6aa30 | | Default Java | 1.7.0_79 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_60 /usr/lib
[jira] [Commented] (HDFS-9364) Unnecessary DNS resolution attempts when creating NameNodeProxies
[ https://issues.apache.org/jira/browse/HDFS-9364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994859#comment-14994859 ] Xiao Chen commented on HDFS-9364: - Thanks [~zhz], attached patch 4 with the fix. > Unnecessary DNS resolution attempts when creating NameNodeProxies > - > > Key: HDFS-9364 > URL: https://issues.apache.org/jira/browse/HDFS-9364 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, performance >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9364.001.patch, HDFS-9364.002.patch, > HDFS-9364.003.patch, HDFS-9364.004.patch > > > When creating NameNodeProxies, we always try to DNS-resolve namenode URIs. > This is unnecessary if the URI is logical, and may be significantly slow if > the DNS is having problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9364) Unnecessary DNS resolution attempts when creating NameNodeProxies
[ https://issues.apache.org/jira/browse/HDFS-9364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9364: Attachment: HDFS-9364.004.patch > Unnecessary DNS resolution attempts when creating NameNodeProxies > - > > Key: HDFS-9364 > URL: https://issues.apache.org/jira/browse/HDFS-9364 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, performance >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9364.001.patch, HDFS-9364.002.patch, > HDFS-9364.003.patch, HDFS-9364.004.patch > > > When creating NameNodeProxies, we always try to DNS-resolve namenode URIs. > This is unnecessary if the URI is logical, and may be significantly slow if > the DNS is having problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9399) Ability to disable HDFS browsing via browseDirectory.jsp , make it configurable
Raghu C Doppalapudi created HDFS-9399: - Summary: Ability to disable HDFS browsing via browseDirectory.jsp , make it configurable Key: HDFS-9399 URL: https://issues.apache.org/jira/browse/HDFS-9399 Project: Hadoop HDFS Issue Type: Bug Reporter: Raghu C Doppalapudi Assignee: Raghu C Doppalapudi Priority: Minor Currently there is no config property available in HDFS to disable file browsing capability. make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.
[ https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994814#comment-14994814 ] Mingliang Liu commented on HDFS-9394: - Test {{org.apache.hadoop.hdfs.TestRollingUpgradeRollback}} fails in branch-2. All other tests can pass locally. Seem unrelated? > branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader > initialization, because HftpFileSystem is missing. > > > Key: HDFS-9394 > URL: https://issues.apache.org/jira/browse/HDFS-9394 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Chris Nauroth >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-9394.000.branch-2.patch > > > On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor > that lists {{HftpFileSystem}} and {{HsftpFileSystem}}. These classes do not > reside in hadoop-hdfs-client. Instead, they reside in hadoop-hdfs. If the > application has hadoop-hdfs-client.jar on the classpath, but not > hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994784#comment-14994784 ] Mingliang Liu commented on HDFS-8971: - Thanks for your suggestion [~szetszwo]. I filed [HDFS-9398] to track the effort of reverting changes in {{ByteArrayManager}} regarding the log message. > Remove guards when calling LOG.debug() and LOG.trace() in client package > > > Key: HDFS-8971 > URL: https://issues.apache.org/jira/browse/HDFS-8971 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: build >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch > > > We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to > {{hadoop-hdfs-client}} module in JIRA > [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and > [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and > {{BlockReader}} in > [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we > also replaced the _log4j_ log with _slf4j_ logger. There were existing code > in the client package to guard the log when calling {{LOG.debug()}} and > {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: > {code:title=Trace with guards|borderStyle=solid} > 724if (LOG.isTraceEnabled()) { > 725 LOG.trace(this + ": found waitable for " + key); > 726} > {code} > In _slf4j_, this kind of guard is not necessary. We should clean the code by > removing the guard from the client package. > {code:title=Trace without guards|borderStyle=solid} > 724LOG.trace("{}: found waitable for {}", this, key); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format
[ https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9398: Attachment: HDFS-9398.000.patch > Make ByteArraryManager log message in one-line format > - > > Key: HDFS-9398 > URL: https://issues.apache.org/jira/browse/HDFS-9398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9398.000.patch > > > Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line > message. It's for sure easy to read, especially in case of multiple-threads. > The easy fix is to use the old format before [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format
[ https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9398: Status: Patch Available (was: Open) > Make ByteArraryManager log message in one-line format > - > > Key: HDFS-9398 > URL: https://issues.apache.org/jira/browse/HDFS-9398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9398.000.patch > > > Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line > message. It's for sure easy to read, especially in case of multiple-threads. > The easy fix is to use the old format before [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format
[ https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9398: Description: Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line message. It's for sure easy to read, especially in case of multiple-threads. The easy fix is to use the old format before [HDFS-8971]. (was: Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line message in ByteArrayManager. It's for sure easy to read, especially in case of multiple-threads. The easy fix is to use the old format before [HDFS-8971].) > Make ByteArraryManager log message in one-line format > - > > Key: HDFS-9398 > URL: https://issues.apache.org/jira/browse/HDFS-9398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line > message. It's for sure easy to read, especially in case of multiple-threads. > The easy fix is to use the old format before [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9398) Make ByteArraryManager log message in one-line format
Mingliang Liu created HDFS-9398: --- Summary: Make ByteArraryManager log message in one-line format Key: HDFS-9398 URL: https://issues.apache.org/jira/browse/HDFS-9398 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Mingliang Liu Assignee: Mingliang Liu Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line message in ByteArrayManager. It's for sure easy to read, especially in case of multiple-threads. The easy fix is to use the old format before [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format
[ https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9398: Issue Type: Improvement (was: Bug) > Make ByteArraryManager log message in one-line format > - > > Key: HDFS-9398 > URL: https://issues.apache.org/jira/browse/HDFS-9398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line > message in ByteArrayManager. It's for sure easy to read, especially in case > of multiple-threads. The easy fix is to use the old format before [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.
[ https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994771#comment-14994771 ] Hadoop QA commented on HDFS-9394: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 8s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} branch-2 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} branch-2 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 46s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2 has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} branch-2 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} branch-2 passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 11s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 10s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 152m 43s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.TestDistributedFileSystem
[jira] [Updated] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java
[ https://issues.apache.org/jira/browse/HDFS-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-9397: --- Assignee: Nicole Pazmany > Fix typo for readChecksum() LOG.warn in BlockSender.java > > > Key: HDFS-9397 > URL: https://issues.apache.org/jira/browse/HDFS-9397 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Enrique Flores >Assignee: Nicole Pazmany >Priority: Trivial > Attachments: HDFS-9397.patch > > > typo for word "verify" found in: > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647 > > {code} > LOG.warn(" Could not read or failed to veirfy checksum for data" > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994756#comment-14994756 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.3 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOu
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994755#comment-14994755 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994757#comment-14994757 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9258) NN should indicate which nodes are stale
[ https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994749#comment-14994749 ] Hadoop QA commented on HDFS-9258: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 451, now 452). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 36s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s {color} | {color:red} Patch generated 56 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 33s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | JDK v1.7.0_79 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem |
[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994743#comment-14994743 ] Tsz Wo Nicholas Sze commented on HDFS-8971: --- Sure, please file a JIRA to revert the change. Thanks! > Remove guards when calling LOG.debug() and LOG.trace() in client package > > > Key: HDFS-8971 > URL: https://issues.apache.org/jira/browse/HDFS-8971 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: build >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch > > > We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to > {{hadoop-hdfs-client}} module in JIRA > [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and > [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and > {{BlockReader}} in > [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we > also replaced the _log4j_ log with _slf4j_ logger. There were existing code > in the client package to guard the log when calling {{LOG.debug()}} and > {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: > {code:title=Trace with guards|borderStyle=solid} > 724if (LOG.isTraceEnabled()) { > 725 LOG.trace(this + ": found waitable for " + key); > 726} > {code} > In _slf4j_, this kind of guard is not necessary. We should clean the code by > removing the guard from the client package. > {code:title=Trace without guards|borderStyle=solid} > 724LOG.trace("{}: found waitable for {}", this, key); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java
[ https://issues.apache.org/jira/browse/HDFS-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrique Flores updated HDFS-9397: - Attachment: HDFS-9397.patch attaching proposed fix. > Fix typo for readChecksum() LOG.warn in BlockSender.java > > > Key: HDFS-9397 > URL: https://issues.apache.org/jira/browse/HDFS-9397 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Enrique Flores >Priority: Trivial > Attachments: HDFS-9397.patch > > > typo for word "verify" found in: > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647 > > {code} > LOG.warn(" Could not read or failed to veirfy checksum for data" > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java
Enrique Flores created HDFS-9397: Summary: Fix typo for readChecksum() LOG.warn in BlockSender.java Key: HDFS-9397 URL: https://issues.apache.org/jira/browse/HDFS-9397 Project: Hadoop HDFS Issue Type: Bug Reporter: Enrique Flores Priority: Trivial typo for word "verify" found in: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647 {code} LOG.warn(" Could not read or failed to veirfy checksum for data" {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994727#comment-14994727 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994726#comment-14994726 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian
[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.
[ https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994725#comment-14994725 ] Hadoop QA commented on HDFS-7163: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-hdfs-project (total was 58, now 59). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 47s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 48m 41s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 17s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-06 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771091/HDFS-7163.003.patch | | JIRA Issue | HDFS-7163 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile |
[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994720#comment-14994720 ] Karthik Kambatla commented on HDFS-2261: +1, pending Jenkins. Thanks for taking this up, [~wheat9]. > AOP unit tests are not getting compiled or run > --- > > Key: HDFS-2261 > URL: https://issues.apache.org/jira/browse/HDFS-2261 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha, 2.0.4-alpha > Environment: > https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console > -compile-fault-inject ant target >Reporter: Giridharan Kesavan >Priority: Minor > Attachments: HDFS-2261.000.patch, hdfs-2261.patch > > > The tests in src/test/aop are not getting compiled or run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6481: Fix Version/s: (was: 2.7.2) 2.7.3 > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.3 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475) > 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: > syncer encountered error, will retry. txid=211 > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBounds
[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994706#comment-14994706 ] Jing Zhao commented on HDFS-9129: - The latest patch looks good to me overall. Here're my comments: # Let's still define {{BlockManagerSafeMode#status}} as private field, and provide getter/setter if necessary. In this way we can have better control of its value. Similarly for {{blockTotal}} and {{blockSafe}}. # The following two initializations may be wrong: with the patch the safemode object is created when contructing BlockManager, before loading fsimage and editlog from disk. {code} private final long startTime = monotonicNow(); {code} {code} private long lastStatusReport = monotonicNow(); {code} # {{shouldIncrementallyTrackBlocks}} is actually determined by {{haEnabled}} thus looks like it can be declared as final and {{isSafeModeTrackingBlocks}} can be simplified. # {{BlockManagerSafeMode#setBlockTotal}} currently does two things: 1) updating threshold numbers, and 2) triggering mode check. We can separate #2 out of this method, and then {{activate}} does not need to do an unnecessary check. # {{reached}} can be renamed to {{reachedTime}} # In the old safemode semantic, once entering the extension state, NN never comes back to the normal safemode state, but can keep waiting in the extension state if the threshold is not met again. The current implementation changes this semantic. It's better to avoid this change here. {code} case EXTENSION: if (!areThresholdsMet()) { // EXTENSION -> PENDING_THRESHOLD status = BMSafeModeStatus.PENDING_THRESHOLD; } {code} # The following code can be simplified. {code} if (status == BMSafeModeStatus.OFF) { return; } if (!shouldIncrementallyTrackBlocks) { return; } {code} # In {{adjustBlockTotals}}, the {{setBlockTotal}} call should be out of the synchronized block. {code} synchronized (this) { ... blockSafe += deltaSafe; setBlockTotal(blockTotal + deltaTotal); } {code} # Not caused by this patch, but since {{doConsistencyCheck}} sometimes is not protected by any lock (e.g., {{computeDatanodeWork}}), the total number of blocks retrieved from blockManager and used by the consistency check can be inaccurate. So I think here we can replace the AssertionError to a warning log message. # Let's still name the first parameter of {{incrementSafeBlockCount}} as "storageNum". # In {{decrementSafeBlockCount}}, {{checkSafeMode}} only needs to be called when the first time the live replica number drops below the safe number. Thus {{checkSafeMode}} should be called within the if. {code} if (blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { this.blockSafe--; } assert blockSafe >= 0; checkSafeMode(); {code} > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994700#comment-14994700 ] Haohui Mai commented on HDFS-2261: -- Rebase on the latest trunk. > AOP unit tests are not getting compiled or run > --- > > Key: HDFS-2261 > URL: https://issues.apache.org/jira/browse/HDFS-2261 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha, 2.0.4-alpha > Environment: > https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console > -compile-fault-inject ant target >Reporter: Giridharan Kesavan >Priority: Minor > Attachments: HDFS-2261.000.patch, hdfs-2261.patch > > > The tests in src/test/aop are not getting compiled or run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-2261: - Attachment: HDFS-2261.000.patch > AOP unit tests are not getting compiled or run > --- > > Key: HDFS-2261 > URL: https://issues.apache.org/jira/browse/HDFS-2261 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha, 2.0.4-alpha > Environment: > https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console > -compile-fault-inject ant target >Reporter: Giridharan Kesavan >Priority: Minor > Attachments: HDFS-2261.000.patch, hdfs-2261.patch > > > The tests in src/test/aop are not getting compiled or run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-2261: - Status: Patch Available (was: Open) > AOP unit tests are not getting compiled or run > --- > > Key: HDFS-2261 > URL: https://issues.apache.org/jira/browse/HDFS-2261 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.4-alpha, 2.0.0-alpha > Environment: > https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console > -compile-fault-inject ant target >Reporter: Giridharan Kesavan >Priority: Minor > Attachments: HDFS-2261.000.patch, hdfs-2261.patch > > > The tests in src/test/aop are not getting compiled or run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994679#comment-14994679 ] Bob Hansen commented on HDFS-9117: -- [~wheat9] - do you agree that we should be reading in xml streams that follow the conventions of the hdfs-config.xml files? e.g. configuration, property, name, value, and final stanzas? bq. The functionality is definitely helpful, but it can be provided as a utility helper instead of baking it into the main contract of libhdfs+. That was my intention in writing this class. A utility helper that would encapsulate reading config files from the field and producing a libhdfs++ Options object. That's what each version has done. I can strip it down to the API you provided, but I wonder what use case it will be serving then. > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, > HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, > HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, > HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9369) Use ctest to run tests for hadoop-hdfs-native-client
[ https://issues.apache.org/jira/browse/HDFS-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994676#comment-14994676 ] Jing Zhao commented on HDFS-9369: - The change looks good to me. +1 > Use ctest to run tests for hadoop-hdfs-native-client > > > Key: HDFS-9369 > URL: https://issues.apache.org/jira/browse/HDFS-9369 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Attachments: HDFS-9369.000.patch > > > Currently we write special rules in {{pom.xml}} to run tests in > {{hadoop-hdfs-native-client}}. This jira proposes to run these tests using > ctest to simplify {{pom.xml}} and improve portability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.
[ https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-9267: Attachment: HDFS-9267.03.patch Hi, [~cmccabe]. I updated the patch to provide an {{ReplicaIterator}} class and refactor the {{BlockPoolSlice}} to use it. The reason that using {{ReplicaIterator}} instead of {{java.util.Iterator}} is that it can {{IOException}} in {{next()}}. Could you give some feedbacks? Thanks a lot. > TestDiskError should get stored replicas through FsDatasetTestUtils. > > > Key: HDFS-9267 > URL: https://issues.apache.org/jira/browse/HDFS-9267 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, > HDFS-9267.02.patch, HDFS-9267.03.patch > > > {{TestDiskError#testReplicationError}} scans local directories to verify > blocks and metadata files, which leaks the details of {{FsDataset}} > implementation. > This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9395) getContentSummary is audit logged as success even if failed
[ https://issues.apache.org/jira/browse/HDFS-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994654#comment-14994654 ] Kihwal Lee commented on HDFS-9395: -- It's by design? HDFS-5163 > getContentSummary is audit logged as success even if failed > --- > > Key: HDFS-9395 > URL: https://issues.apache.org/jira/browse/HDFS-9395 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kuhu Shukla > > Audit logging is in the fainally block along with the lock unlocking, so it > is always logged as success even for cases like FileNotFoundException is > thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994596#comment-14994596 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.2 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994598#comment-14994598 ] Haohui Mai commented on HDFS-9117: -- bq. As an example, let's say we are writing a native replacement for the dfs tool using the native libhdfs++ codebase (not the libhdfs compatability layer) that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for our consumers, they would expect that a properly configured Hadoop node [with the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run "hdfspp -ls /tmp" and have it automatically find the NN and configure the communications parameters correctly to talk to their cluster. Unfortunately the assumption is broken in many ways -- it is fully implementation defined. For example, there are issues whether {{HADOOP_HOME}} or {{HADOOP_PREFIX}} should be chosen. Configuration files are only required to be specified in {{CLASSPATH}} but not necessary in the {{HADOOP_HOME}} directory. Different vendors might have changed their scripts and put the configuration in different places. Scripts evolves across versions. We have very different scripts between trunk and branch-2. While it definitely useful in the libhdfs compatibility layer, I'm doubtful it should be added into the core part of the library due to all these complexity. Therefore I believe that the focus of the library should be providing mechanisms to interact with HDFS but not concrete policy (e.g., location of the configuration) on how to interact. We don't have any libraries to implement the protocols and mechanisms to interact with HDFS yet (which is the reusable part). The policy is highly customized in different environments but it can be worked around easily (which is the less reusable part). bq. given this context, do you agree that we need to support libhdfs++ compatibility with the hdfs-site.xml files that are already deployed at customer There are two levels of APIs when you talk about libhdfs++ APIs. The core API focuses on providing mechanisms to interact with HDFS, such as implementing the Hadoop RPC, DataTransferProtocol. The API that you're referring to might be a convenient API for libhdfs++. The functionality is definitely helpful, but it can be provided as a utility helper instead of baking it into the main contract of libhdfs++. My suggestion is the following: 1. Focusing on getting the code on parsing XML in strings (which is the core functionality of parsing configuration) in this jira. It should not contain any file operations. 2. Separating the tasks on searching through paths, reading files, etc. into different jiras. For now it makes sense to put it along with the {{libhdfs}} compatibility layer. Since it's an implementation detail I believe we can quickly go through it. At a later point of time we can promote the code to a common library once we have a proposal on how the libhdfs++ convenient APIs look like. > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, > HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, > HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, > HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994597#comment-14994597 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994595#comment-14994595 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994589#comment-14994589 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994587#comment-14994587 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994588#comment-14994588 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.2 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOu
[jira] [Commented] (HDFS-9364) Unnecessary DNS resolution attempts when creating NameNodeProxies
[ https://issues.apache.org/jira/browse/HDFS-9364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994550#comment-14994550 ] Zhe Zhang commented on HDFS-9364: - Thanks for clarifying this Xiao. I agree with the approach in 03 patch. So +1 pending the minor fix. > Unnecessary DNS resolution attempts when creating NameNodeProxies > - > > Key: HDFS-9364 > URL: https://issues.apache.org/jira/browse/HDFS-9364 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, performance >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9364.001.patch, HDFS-9364.002.patch, > HDFS-9364.003.patch > > > When creating NameNodeProxies, we always try to DNS-resolve namenode URIs. > This is unnecessary if the URI is logical, and may be significantly slow if > the DNS is having problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.
[ https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9394: Hadoop Flags: Reviewed +1 for the patch, pending pre-commit run. I verified locally that the hadoop-hdfs-client tests pass on branch-2 after applying this patch. > branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader > initialization, because HftpFileSystem is missing. > > > Key: HDFS-9394 > URL: https://issues.apache.org/jira/browse/HDFS-9394 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Chris Nauroth >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-9394.000.branch-2.patch > > > On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor > that lists {{HftpFileSystem}} and {{HsftpFileSystem}}. These classes do not > reside in hadoop-hdfs-client. Instead, they reside in hadoop-hdfs. If the > application has hadoop-hdfs-client.jar on the classpath, but not > hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9236: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks [~twu] for the contribution and [~walter.k.su] for the review. > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Fix For: 2.8.0 > > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994539#comment-14994539 ] Bob Hansen commented on HDFS-9117: -- bq. 2. Adding search paths and parsing them can be replaced by passing in a vector of path. Parsing the environment variable is specific to to the compatibility layer of libhdfs. I agree that we should take a vector of paths rather than parsing a string; thanks for that suggestion. See above comment re: dereferencing HADOOP_HOME. > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, > HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, > HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, > HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994534#comment-14994534 ] Bob Hansen commented on HDFS-9117: -- [~wheat9]: thanks for the feedback and carrying the conversation forward. The primary use case of the Configuration class (as I see it) is to provide compatibility not with libhdfs, but with deployed Hadoop java environments. As an example, let's say we are writing a native replacement for the dfs tool using the native libhdfs++ codebase (not the libhdfs compatability layer) that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for our consumers, they would expect that a properly configured Hadoop node [with the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run "hdfspp -ls /tmp" and have it automatically find the NN and configure the communications parameters correctly to talk to their cluster. To fully support that use case, we need to read xml in the currently-deployed file format (which specifies that we honor "final" tags where they appear in the files), and dereference at least HADOOP_HOME in loading the default files. We could force our consumers to do that, but that doesn't seem a kindness for code we need to write anyway. We also need to be able to read the encodings that are being used in the field (such as "1M" for buffer sizes). If we really think that config-substitution and environmental substitution is exceedingly rare in the field, we can defer the work, but I am concerned that we will deploy a libhdfs++ application to the field only to find that it can't read an early adopter's config file. That work has already been shuffled off to HDFS-9385 so we can revisit it later. Other use cases may not need to read existing hdfs-site.xml files, which is why I think you are wise in have a separation between the Config reader and the Options object. I agree with your concern that the libhdfs++ default Options object will get out of sync with the Java defaults, and will happily write unit test that verifies that they stay together. [~wheat9] - given this context, do you agree that we need to support libhdfs++ compatibility with the hdfs-site.xml files that are already deployed at customer sites? > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, > HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, > HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, > HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9387) Parse namenodeUri parameter only once in NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()
[ https://issues.apache.org/jira/browse/HDFS-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994501#comment-14994501 ] Mingliang Liu commented on HDFS-9387: - I think the failing tests are unrelated. > Parse namenodeUri parameter only once in > NNThroughputBenchmark$OperationStatsBase#verifyOpArgument() > > > Key: HDFS-9387 > URL: https://issues.apache.org/jira/browse/HDFS-9387 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9387.000.patch > > > In {{NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()}}, the > {{namenodeUri}} is always parsed from {{-namenode}} argument. This works just > fine if the {{-op}} parameter is not {{all}}, as the single benchmark will > need to parse the {{namenodeUri}} from args anyway. > When the {{-op}} is {{all}}, namely all sub-benchmark will run, multiple > sub-benchmark will call the {{verifyOpArgument()}} method. In this case, the > first sub-benchmark reads the {{namenode}} argument and removes it from args. > The other sub-benchmarks will thereafter read {{null}} value since the > argument is removed. This contradicts the intension of providing {{namenode}} > for all sub-benchmarks. > {code:title=current code} > try { > namenodeUri = StringUtils.popOptionWithArgument("-namenode", args); > } catch (IllegalArgumentException iae) { > printUsage(); > } > {code} > The fix is to parse the {{namenodeUri}}, which is shared by all > sub-benchmarks, from {{-namenode}} argument only once. This follows the > convention of parsing other global arguments in > {{OperationStatsBase#verifyOpArgument()}}. > {code:title=simple fix} > if (args.indexOf("-namenode") >= 0) { > try { > namenodeUri = StringUtils.popOptionWithArgument("-namenode", args); > } catch (IllegalArgumentException iae) { > printUsage(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9396) Total files and directories on jmx and web UI on standby is uninitialized
[ https://issues.apache.org/jira/browse/HDFS-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9396: - Attachment: HDFS-9396.patch > Total files and directories on jmx and web UI on standby is uninitialized > - > > Key: HDFS-9396 > URL: https://issues.apache.org/jira/browse/HDFS-9396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Blocker > Attachments: HDFS-9396.patch > > > After HDFS-6763, the quota on the standby namenode is not being updated until > it transitions to active. This causes the jmx and the web ui files and dir > count to be uninitialized or unupdated. In some cases it shows a negative > number. > It is because the legacy way of getting the inode count, which existed since > before the creation of inode table. It relies on the root inode's quota being > properly updated. We can make it simply return the size of the inode table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9396) Total files and directories on jmx and web UI on standby is uninitialized
[ https://issues.apache.org/jira/browse/HDFS-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9396: - Status: Patch Available (was: Open) > Total files and directories on jmx and web UI on standby is uninitialized > - > > Key: HDFS-9396 > URL: https://issues.apache.org/jira/browse/HDFS-9396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Blocker > Attachments: HDFS-9396.patch > > > After HDFS-6763, the quota on the standby namenode is not being updated until > it transitions to active. This causes the jmx and the web ui files and dir > count to be uninitialized or unupdated. In some cases it shows a negative > number. > It is because the legacy way of getting the inode count, which existed since > before the creation of inode table. It relies on the root inode's quota being > properly updated. We can make it simply return the size of the inode table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9396) Total files and directories on jmx and web UI on standby is uninitialized
[ https://issues.apache.org/jira/browse/HDFS-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-9396: Assignee: Kihwal Lee > Total files and directories on jmx and web UI on standby is uninitialized > - > > Key: HDFS-9396 > URL: https://issues.apache.org/jira/browse/HDFS-9396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-9396.patch > > > After HDFS-6763, the quota on the standby namenode is not being updated until > it transitions to active. This causes the jmx and the web ui files and dir > count to be uninitialized or unupdated. In some cases it shows a negative > number. > It is because the legacy way of getting the inode count, which existed since > before the creation of inode table. It relies on the root inode's quota being > properly updated. We can make it simply return the size of the inode table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy
[ https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8905: Attachment: HDFS-8905-v2.patch > Refactor DFSInputStream#ReaderStrategy > -- > > Key: HDFS-8905 > URL: https://issues.apache.org/jira/browse/HDFS-8905 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: HDFS-7285 > > Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch > > > DFSInputStream#ReaderStrategy family don't look very good. This refactors a > little bit to make them make more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
[ https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994491#comment-14994491 ] Mingliang Liu commented on HDFS-9379: - Thanks for your review [~arpitagarwal]. {quote} Did you get a chance to test it manually? {quote} Yes I did test this manually. I ran the test with different combination of arguments, including {{-namenode}}, {{-datanodes}} and {{blocksPerReport}}. If the {{-datatnodes}} is greater than 9, the trunk code will run the benchmark successfully with this patch, and it will fail without this patch. The failing code is the assertion which checks the lexicographical order of datanodes. {quote} The unit test Test NNThroughputBenchmark looks inadequate. It passed even when I replaced the dnIdx computation with zero. {quote} The {{TestNNThroughputBenchmark}} seems a driver to run the benchmark, instead of unit testing the benchmark itself. Thus I did not change it. If we make the {{dnIdx}} always as zero when searching the datatnode index in {{datanodes}} array given datanode info, the test could pass as the generated block will always be added to the first datanode. The benchmark itself allows this, though the test results will be dubious. {quote} I looked through the remaining usages of datanodes for any dependencies on lexical ordering and didn't find any. {quote} That's true. The {{BlockReportStats}} is the only use case I found that depends on the lexical ordering of {{datanodes}} array. I ran other tests and they look good when the {{-datanodes}} or {{-threads}} is greater than 10. > Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes > -- > > Key: HDFS-9379 > URL: https://issues.apache.org/jira/browse/HDFS-9379 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9379.000.patch > > > Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on > sorted {{datanodes}} array in the lexicographical order of datanode's > {{xferAddr}}. > * There is an assertion of datanode's {{xferAddr}} lexicographical order when > filling the {{datanodes}}, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152]. > * When searching the datanode by {{DatanodeInfo}}, it uses binary search > against the {{datanodes}} array, see [the > code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187] > In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In > {{NNThroughputBenchmark}}, the port is simply _the index of the tiny > datanode_ plus one. > The problem here is that, when there are more than 9 tiny datanodes > ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will > be invalid as the string value of datanode index is not in lexicographical > order any more. For example, > {code} > ... > 192.168.54.40:8 > 192.168.54.40:9 > 192.168.54.40:10 > 192.168.54.40:11 > ... > {code} > {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will > fail and the binary search won't work. > The simple fix is to calculate the datanode index by port directly, instead > of using binary search. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9396) Total files and directories on jmx and web UI on standby is uninitialized
[ https://issues.apache.org/jira/browse/HDFS-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994475#comment-14994475 ] Kihwal Lee commented on HDFS-9396: -- {code:java} long totalInodes() { readLock(); try { return rootDir.getDirectoryWithQuotaFeature().getSpaceConsumed() .getNameSpace(); } finally { readUnlock(); } } {code} It can simply do without locking. {code:java} return inodeMap.size(); {code} > Total files and directories on jmx and web UI on standby is uninitialized > - > > Key: HDFS-9396 > URL: https://issues.apache.org/jira/browse/HDFS-9396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Blocker > > After HDFS-6763, the quota on the standby namenode is not being updated until > it transitions to active. This causes the jmx and the web ui files and dir > count to be uninitialized or unupdated. In some cases it shows a negative > number. > It is because the legacy way of getting the inode count, which existed since > before the creation of inode table. It relies on the root inode's quota being > properly updated. We can make it simply return the size of the inode table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9396) Total files and directories on jmx and web UI on standby is uninitialized
Kihwal Lee created HDFS-9396: Summary: Total files and directories on jmx and web UI on standby is uninitialized Key: HDFS-9396 URL: https://issues.apache.org/jira/browse/HDFS-9396 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Blocker After HDFS-6763, the quota on the standby namenode is not being updated until it transitions to active. This causes the jmx and the web ui files and dir count to be uninitialized or unupdated. In some cases it shows a negative number. It is because the legacy way of getting the inode count, which existed since before the creation of inode table. It relies on the root inode's quota being properly updated. We can make it simply return the size of the inode table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994469#comment-14994469 ] Harsh J commented on HDFS-8986: --- This continues to cause a bunch of confusion among our user-base who are still reliant on the pre-snapshot feature behaviour, so it would be nice to see it implemented. > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Jagadesh Kiran N > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994468#comment-14994468 ] Haohui Mai commented on HDFS-9117: -- Thinking about the patch a little bit more, I believe that there is little value to follow the original implementation in Java. The implementation looks over complicated. What is the minimal pieces of loading the configuration? Does the class need to address every aspect of the concerns? To me the core part of the patch should be parsing XML and populating the map of configuration. Functionality like searching through the filesystem, environment variables, expanding the configuration, etc., are all optional. 1. Note that the main motivation of the {{Options}} class is to make all the configurations standalone and explicit. Users should be able to specify all configuration through the {{Options}} object. Default values are fundamental parts of the contracts in the {{Options}} class. Filling the configuration with the default values from the {{-*default.xml}} creates inconsistency and bugs that are to be detected. The flip side is that {{Options}} can get out of dated if someone changes the default value of the configuration, however it can be caught effectively through adding a unit test. 2. Adding search paths and parsing them can be replaced by passing in a {{vector}} of path. Parsing the environment variable is specific to to the compatibility layer of {{libhdfs}}. 3. Some of the functionality might be useful at later time. Since the code that uses that functionality is yet-to-be-written, it is difficult to review and justify what is the appropriate design and implementation. We can revisit some of these issues one the code that actually uses the functionality is available. I propose the following interfaces: {code} class Configuration { public: enum Priority { kDefault, kSpecified, kFinal, }; /** * Load configurations that are specified in XML format. * Each configuration file is associated with a priority. * A configuration with higher priority will overwrite the ones with lower priority. **/ int ParseXMLConfiguration(const std::string &xml, Priority priority); /** * Get the value configuration, return empty if it's unspecified. **/ template Optional get(const std::string &key); /** * Get the value configuration, return the default value of configuration * if it's unspecified. **/ template T getWithDefault(const std::string &key); }; {code} > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, > HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, > HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, > HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994465#comment-14994465 ] Bob Hansen commented on HDFS-9144: -- Hm. Very noisy. Probably want a squashed pull next time, but it looks like it is available for using GitHub for review. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994458#comment-14994458 ] ASF GitHub Bot commented on HDFS-9144: -- GitHub user bobhansen opened a pull request: https://github.com/apache/hadoop/pull/43 HDFS-9144: libhdfs++ refactoring Code changes for HDFS-9144 as described in the JIRA. Removing some templates and traits and restructuring the code for more modularity. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bobhansen/hadoop HDFS-9144-merge Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/43.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #43 commit 1fb1ea527c9b5321e6da6c2543859db2ec3eaf7c Author: Bob Hansen Date: 2015-10-22T11:58:41Z Refactored NameNodeConnection commit c6cf5175b9c21561bdcbd22be27f50e22a1d3ebd Author: Bob Hansen Date: 2015-10-22T12:01:36Z Removed fs_ from InputStream commit 8b8190d334224d8acec9a4bef97d5e0226c1045a Author: Bob Hansen Date: 2015-10-22T13:05:53Z Moved GetBlockInfo to NN connection commit 108b54f3079ed21149a59b9222d6d9832ee05d79 Author: Bob Hansen Date: 2015-10-22T13:20:56Z Moved GetBlockLocations to std::function commit 6d112a17048bcec437701b422209641e56f6196e Author: Bob Hansen Date: 2015-10-22T13:48:02Z Added comments commit e57b0ed02e29781f347499f0f3546659870aabab Author: Bob Hansen Date: 2015-10-22T13:52:39Z Stripped whitespace commit c9c82125e8c0b742ee3a70d6fdbdedca180cdd4f Author: Bob Hansen Date: 2015-10-27T16:07:33Z Renamed NameNodeConnection to NameNodeOperations commit 01499b6027ec771ebf04d4723899ee976b2a6044 Author: Bob Hansen Date: 2015-10-27T23:26:26Z Renamed input_stream and asio_continuation commit 02c67837fe832e45286a675f1a27fa29e1b80a9a Author: Bob Hansen Date: 2015-10-27T23:30:44Z Renamed CreatePipeline to Connect commit 5d28d02e1752be74975647f8dc656776ab9e2cbf Author: Bob Hansen Date: 2015-10-27T23:58:18Z Rename async_connect to async_request commit 9d98bf41091c923103cbeeadb5459c3119b50584 Author: Bob Hansen Date: 2015-10-28T13:01:38Z Renamed read_some to read_packet commit 6ced4a97e297ce0e833db8dbd4b38c91c966d71c Author: Bob Hansen Date: 2015-10-28T13:15:50Z Renamed async_request to async_request_block commit f05a771e578969b9b281de4e0c97887f98b0f2cf Author: Bob Hansen Date: 2015-10-28T13:19:09Z Renamed BlockReader::request to request_block commit fcf1585bf67f84ef8c0acc72660d2ad250005e3b Author: Bob Hansen Date: 2015-10-28T19:12:39Z Moved to file_info commit a3fd975285b25a3eae448e5ac46d0118a14d6610 Author: Bob Hansen Date: 2015-10-28T19:16:20Z Made file_info pointers const commit 366f488b8e8364eba3f1966b931216d2bf404ae1 Author: Bob Hansen Date: 2015-10-28T21:37:46Z Refactored DataNodeConnection, etc. commit 418799feb8d12181d9e5bd6b6aa94333bb21e126 Author: Bob Hansen Date: 2015-10-29T13:53:46Z Added shared_ptr to DN_Connection commit f043e154a261e9ff64f1ead450e3a256ecd023a2 Author: Bob Hansen Date: 2015-10-29T15:31:28Z Moved DNConnection into trait commit aea859ff34a6768c7df29ec25f1abd2b92835b9e Author: Bob Hansen Date: 2015-10-29T15:32:12Z Trimmed whitespace commit 55d7b5dcd92b0fd9d0011e97d8f47e78c3316205 Author: Bob Hansen Date: 2015-10-29T17:23:30Z Re-enabled IS tests commit 142efabbda38852b431d94096d6cef69f5c96393 Author: Bob Hansen Date: 2015-10-29T17:31:05Z Cleaned up some tests commit 4bc0f448fe52a762a242428a1331272c9fee3247 Author: Bob Hansen Date: 2015-10-29T21:53:57Z Working on less templates commit dd16d4fa9f08f55f9d4140219471f002eca5a8ed Author: Bob Hansen Date: 2015-10-29T23:28:01Z Compiles! commit 2b14efa8277c66a3e9e0fb67af925501757d39f8 Author: Bob Hansen Date: 2015-10-30T20:46:52Z Fixed DNconnection signature commit 8d143e789a98431f8cd2cb08db37a0a05f4d9c77 Author: Bob Hansen Date: 2015-11-02T16:35:54Z Fixed segfault in ReadData commit b6f5454e626c1caa1b76398c9edf220fc1252be9 Author: Bob Hansen Date: 2015-11-02T18:36:15Z Removed BlockReader callback templates commit 3b5d712b454f5b817c22909bac2f3477a64624fe Author: Bob Hansen Date: 2015-11-02T18:52:16Z Removed last templates from BlockReader commit d9b9241f12a957226df7ccacad07d8e1a0d98cca Author: Bob Hansen Date: 2015-11-02T20:56:43Z Moved entirely over to BlockReader w/out templates commit 5de0bce35fb52b7a688d3fc4ad02748106fca38e Author: Bob Hansen Date: 2015-11-02T21:06:25Z Removed unnecessary impls commit d5baa8784643bdfed454c8a4ba0edb102d73f40a Author: Bob Hansen Date: 2015-11-03T15:00:50Z Moved DN to its own file > Refactor libhdfs int
[jira] [Updated] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt
[ https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9328: -- Attachment: HDFS-9328.HDFS-8707.001.patch New Patch: -left in clang-format -changed name to CONTRIBUTING.md, added some markdown to make it look nicer. -Added a couple extra bits about portability to (4) that [~ste...@apache.org] suggested. I'm new to markdown. Do people typically self limit line width or just assume the rendering software will handle that? I'd appreciate any other feedback as well. > Formalize coding standards for libhdfs++ and put them in a README.txt > - > > Key: HDFS-9328 > URL: https://issues.apache.org/jira/browse/HDFS-9328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Blocker > Attachments: HDFS-9328.HDFS-8707.000.patch, > HDFS-9328.HDFS-8707.001.patch > > > We have 2-3 people working on this project full time and hopefully more > people will start contributing. In order to efficiently scale we need a > single, easy to find, place where developers can check to make sure they are > following the coding standards of this project to both save their time and > save the time of people doing code reviews. > The most practical place to do this seems like a README file in libhdfspp/. > The foundation of the standards is google's C++ guide found here: > https://google-styleguide.googlecode.com/svn/trunk/cppguide.html > Any exceptions to google's standards or additional restrictions need to be > explicitly enumerated so there is one single point of reference for all > libhdfs++ code standards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9395) getContentSummary is audit logged as success even if failed
[ https://issues.apache.org/jira/browse/HDFS-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla reassigned HDFS-9395: - Assignee: Kuhu Shukla > getContentSummary is audit logged as success even if failed > --- > > Key: HDFS-9395 > URL: https://issues.apache.org/jira/browse/HDFS-9395 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kuhu Shukla > > Audit logging is in the fainally block along with the lock unlocking, so it > is always logged as success even for cases like FileNotFoundException is > thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994439#comment-14994439 ] Hudson commented on HDFS-9236: -- FAILURE: Integrated in Hadoop-trunk-Commit #8769 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8769/]) HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994440#comment-14994440 ] Hudson commented on HDFS-9318: -- FAILURE: Integrated in Hadoop-trunk-Commit #8769 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8769/]) HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
[ https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994405#comment-14994405 ] Kihwal Lee commented on HDFS-8708: -- bq. The default value is false for HA. I think it's good enough. I agree. Besides, there are cases where we want this to be on and work with HA. E.g. the IP address change detection code in ipc Client does not work, if the exception bubbles upto HA retey logic. It only works when retry is done within the same Client instance. > DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies > -- > > Key: HDFS-8708 > URL: https://issues.apache.org/jira/browse/HDFS-8708 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jitendra Nath Pandey >Assignee: Brahma Reddy Battula >Priority: Critical > > DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to > ensure fast failover. Otherwise, dfsclient retries the NN which is no longer > active and delays the failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9395) getContentSummary is audit logged as success even if failed
Kihwal Lee created HDFS-9395: Summary: getContentSummary is audit logged as success even if failed Key: HDFS-9395 URL: https://issues.apache.org/jira/browse/HDFS-9395 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Audit logging is in the fainally block along with the lock unlocking, so it is always logged as success even for cases like FileNotFoundException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9318: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 3.0.0 Status: Resolved (was: Patch Available) Thanks for working on this, Kuhu. > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994298#comment-14994298 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #637 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/637/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.2 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.
[jira] [Updated] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.
[ https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9394: Status: Patch Available (was: Open) > branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader > initialization, because HftpFileSystem is missing. > > > Key: HDFS-9394 > URL: https://issues.apache.org/jira/browse/HDFS-9394 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Chris Nauroth >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-9394.000.branch-2.patch > > > On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor > that lists {{HftpFileSystem}} and {{HsftpFileSystem}}. These classes do not > reside in hadoop-hdfs-client. Instead, they reside in hadoop-hdfs. If the > application has hadoop-hdfs-client.jar on the classpath, but not > hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.
[ https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HDFS-7163: - Attachment: HDFS-7163.003.patch Fixed the checkstyle and findbugs warnings. None of the unit tests listed above failed in my own build environment. > WebHdfsFileSystem should retry reads according to the configured retry policy. > -- > > Key: HDFS-7163 > URL: https://issues.apache.org/jira/browse/HDFS-7163 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.0.0, 2.5.1 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: HDFS-7163.001.patch, HDFS-7163.002.patch, > HDFS-7163.003.patch, WebHDFS Read Retry.pdf > > > In the current implementation of WebHdfsFileSystem, opens are retried > according to the configured retry policy, but not reads. Therefore, if a > connection goes down while data is being read, the read will fail and the > read will have to be retried by the client code. > Also, after a connection has been established, the next read (or seek/read) > will fail and the read will have to be restarted by the client code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994284#comment-14994284 ] Mingliang Liu commented on HDFS-8971: - Thanks for reporting this [~szetszwo]. I totally agree with you that we should consider one-line message in {{ByteArrayManager}}. It's for sure easy to read, especially in case of multiple-threads. Perhaps we can simply revert the changes in this class? I revisited the patch and other classes should be fine. > Remove guards when calling LOG.debug() and LOG.trace() in client package > > > Key: HDFS-8971 > URL: https://issues.apache.org/jira/browse/HDFS-8971 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: build >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch > > > We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to > {{hadoop-hdfs-client}} module in JIRA > [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and > [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and > {{BlockReader}} in > [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we > also replaced the _log4j_ log with _slf4j_ logger. There were existing code > in the client package to guard the log when calling {{LOG.debug()}} and > {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: > {code:title=Trace with guards|borderStyle=solid} > 724if (LOG.isTraceEnabled()) { > 725 LOG.trace(this + ": found waitable for " + key); > 726} > {code} > In _slf4j_, this kind of guard is not necessary. We should clean the code by > removing the guard from the client package. > {code:title=Trace without guards|borderStyle=solid} > 724LOG.trace("{}: found waitable for {}", this, key); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9318) considerLoad factor can be improved
[ https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994281#comment-14994281 ] Kihwal Lee commented on HDFS-9318: -- +1 lgtm > considerLoad factor can be improved > --- > > Key: HDFS-9318 > URL: https://issues.apache.org/jira/browse/HDFS-9318 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch > > > Currently considerLoad avoids choosing nodes that are too active, so it helps > level the HDFS load across the cluster. Under normal conditions, this is > desired. However, when a cluster has a large percentage of nearly full nodes, > this can make it difficult to find good targets because the placement policy > wants to avoid the full nodes, but considerLoad wants to avoid the busy > less-full nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994263#comment-14994263 ] Hudson commented on HDFS-6481: -- FAILURE: Integrated in Hadoop-trunk-Commit #8768 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8768/]) HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 0b18e5e8c69b40c9a446fff448d38e0dd10cb45e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.2 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputSt
[jira] [Updated] (HDFS-9258) NN should indicate which nodes are stale
[ https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-9258: -- Status: Patch Available (was: In Progress) > NN should indicate which nodes are stale > > > Key: HDFS-9258 > URL: https://issues.apache.org/jira/browse/HDFS-9258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > Attachments: HDFS-9258-v1.patch > > > Determining why the NN is not coming out of safemode is difficult - is it a > bug or pending block reports? If the number of nodes appears sufficient, but > there are missing blocks, it would be nice to know which nodes haven't block > reported (stale). Instead of forcing the NN to leave safemode prematurely, > the SE can first force block reports from stale nodes. > The datanode report and the web ui's node list should contain this > information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9236: Target Version/s: 2.8.0 (was: 2.7.3) > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, > HDFS-9236.006.patch, HDFS-9236.007.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9258) NN should indicate which nodes are stale
[ https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-9258: -- Attachment: HDFS-9258-v1.patch Added isStale to jmx info. Added an {{isStale()}} call to be on DNInfo and replaced the old one where ever it was possible. Also {{chooseDatanodesForCaching()}} was a static call which is called only once from {{addNewPendingCached()}} which is non-static. Hence moving chooseDatanodesForCaching to non-static method. > NN should indicate which nodes are stale > > > Key: HDFS-9258 > URL: https://issues.apache.org/jira/browse/HDFS-9258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > Attachments: HDFS-9258-v1.patch > > > Determining why the NN is not coming out of safemode is difficult - is it a > bug or pending block reports? If the number of nodes appears sufficient, but > there are missing blocks, it would be nice to know which nodes haven't block > reported (stale). Instead of forcing the NN to leave safemode prematurely, > the SE can first force block reports from stale nodes. > The datanode report and the web ui's node list should contain this > information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.
[ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994235#comment-14994235 ] Yongjun Zhang commented on HDFS-9249: - Thanks [~jojochuang] for the new rev, +1 pending jenkins. > NPE thrown if an IOException is thrown in NameNode. > - > > Key: HDFS-9249 > URL: https://issues.apache.org/jira/browse/HDFS-9249 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch, > HDFS-9249.003.patch, HDFS-9249.004.patch, HDFS-9249.005.patch, > HDFS-9249.006.patch > > > This issue was found when running test case > TestBackupNode.testCheckpointNode, but upon closer look, the problem is not > due to the test case. > Looks like an IOException was thrown in > try { > initializeGenericKeys(conf, nsId, namenodeId); > initialize(conf); > try { > haContext.writeLock(); > state.prepareToEnterState(haContext); > state.enterState(haContext); > } finally { > haContext.writeUnlock(); > } > causing the namenode to stop, but the namesystem was not yet properly > instantiated, causing NPE. > I tried to reproduce locally, but to no avail. > Because I could not reproduce the bug, and the log does not indicate what > caused the IOException, I suggest make this a supportability JIRA to log the > exception for future improvement. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) > at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827) > at > org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) > The last few lines of log: > 2015-10-14 19:45:07,807 INFO namenode.NameNode > (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint] > 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started > (again) > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is > hdfs://localhost:37835 > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use > localhost:37835 to access this namenode/service. > 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster > 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem > (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for > active state > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time > for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of > syncs: 4 SyncTimes(ms): 2 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, > exiting > 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003 > 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-000
[jira] [Updated] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.
[ https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9394: Attachment: HDFS-9394.000.branch-2.patch Thank you [~cnauroth] for reporting this. As [~wheat9] said, when we separated the classes to {{hadoop-hdfs-client}}, we tried to address this in [HDFS-9166]. I think the original patch should work just fine, but it was probably not fully committed. Hopefully the fix is simple. Let's see if the v0 patch works. > branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader > initialization, because HftpFileSystem is missing. > > > Key: HDFS-9394 > URL: https://issues.apache.org/jira/browse/HDFS-9394 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Chris Nauroth >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-9394.000.branch-2.patch > > > On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor > that lists {{HftpFileSystem}} and {{HsftpFileSystem}}. These classes do not > reside in hadoop-hdfs-client. Instead, they reside in hadoop-hdfs. If the > application has hadoop-hdfs-client.jar on the classpath, but not > hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt
[ https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994229#comment-14994229 ] Steve Loughran commented on HDFS-9328: -- think it's Power. Nobody owns up to Itanium as nobody has the power budget to build up a rack of enough nodes for 3x redundancy to work as a storage mechanism > Formalize coding standards for libhdfs++ and put them in a README.txt > - > > Key: HDFS-9328 > URL: https://issues.apache.org/jira/browse/HDFS-9328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Blocker > Attachments: HDFS-9328.HDFS-8707.000.patch > > > We have 2-3 people working on this project full time and hopefully more > people will start contributing. In order to efficiently scale we need a > single, easy to find, place where developers can check to make sure they are > following the coding standards of this project to both save their time and > save the time of people doing code reviews. > The most practical place to do this seems like a README file in libhdfspp/. > The foundation of the standards is google's C++ guide found here: > https://google-styleguide.googlecode.com/svn/trunk/cppguide.html > Any exceptions to google's standards or additional restrictions need to be > explicitly enumerated so there is one single point of reference for all > libhdfs++ code standards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9258) NN should indicate which nodes are stale
[ https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9258 started by Kuhu Shukla. - > NN should indicate which nodes are stale > > > Key: HDFS-9258 > URL: https://issues.apache.org/jira/browse/HDFS-9258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > > Determining why the NN is not coming out of safemode is difficult - is it a > bug or pending block reports? If the number of nodes appears sufficient, but > there are missing blocks, it would be nice to know which nodes haven't block > reported (stale). Instead of forcing the NN to leave safemode prematurely, > the SE can first force block reports from stale nodes. > The datanode report and the web ui's node list should contain this > information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt
[ https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994193#comment-14994193 ] James Clampffer commented on HDFS-9328: --- Good idea on the markdown. I'd really like this to be a complete set of rules to avoid new rule surprises down the road. I used short circuit as an example because I happened to know that that'd be an exception. There's plenty of other places where I could see adding that sort of stuff if I was only concerned about x86-64. I'd hate for someone to work really hard on a patch that does some really cool but platform specific optimizations and then have the idea shot down during code review. > Formalize coding standards for libhdfs++ and put them in a README.txt > - > > Key: HDFS-9328 > URL: https://issues.apache.org/jira/browse/HDFS-9328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Blocker > Attachments: HDFS-9328.HDFS-8707.000.patch > > > We have 2-3 people working on this project full time and hopefully more > people will start contributing. In order to efficiently scale we need a > single, easy to find, place where developers can check to make sure they are > following the coding standards of this project to both save their time and > save the time of people doing code reviews. > The most practical place to do this seems like a README file in libhdfspp/. > The foundation of the standards is google's C++ guide found here: > https://google-styleguide.googlecode.com/svn/trunk/cppguide.html > Any exceptions to google's standards or additional restrictions need to be > explicitly enumerated so there is one single point of reference for all > libhdfs++ code standards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt
[ https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994179#comment-14994179 ] James Clampffer commented on HDFS-9328: --- I'll change it to markdown as you and [~wheat9] suggested. Good idea about alignment/endianness. I'll and get this running on an ARM machine in big endian mode and see if anything shakes out of the existing code. Out of curiosity what architectures are people running Hadoop/HDFS on that can't do unaligned accesses? Itanium or Sparc? > Formalize coding standards for libhdfs++ and put them in a README.txt > - > > Key: HDFS-9328 > URL: https://issues.apache.org/jira/browse/HDFS-9328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Blocker > Attachments: HDFS-9328.HDFS-8707.000.patch > > > We have 2-3 people working on this project full time and hopefully more > people will start contributing. In order to efficiently scale we need a > single, easy to find, place where developers can check to make sure they are > following the coding standards of this project to both save their time and > save the time of people doing code reviews. > The most practical place to do this seems like a README file in libhdfspp/. > The foundation of the standards is google's C++ guide found here: > https://google-styleguide.googlecode.com/svn/trunk/cppguide.html > Any exceptions to google's standards or additional restrictions need to be > explicitly enumerated so there is one single point of reference for all > libhdfs++ code standards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6481: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.2 Status: Resolved (was: Patch Available) Committed this patch. It's a low risk change so I committed it to branch-2.7 too. Thanks for diagnosing and fixing this [~szetszwo]. > DatanodeManager#getDatanodeStorageInfos() should check the length of > storageIDs > --- > > Key: HDFS-6481 > URL: https://issues.apache.org/jira/browse/HDFS-6481 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Ted Yu >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.7.2 > > Attachments: h6481_20151105.patch, hdfs-6481-v1.txt > > > Ian Brooks reported the following stack trace: > {code} > 2014-06-03 13:05:03,915 WARN [DataStreamer for file > /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 > block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] > hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): > 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java