[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124998#comment-17124998 ] hemanthboyina commented on HDFS-15375: -- test failures were not related {quote}We can't remove {{pendingNum}} from here, it will create extra replication task if this count doesn't include pendingNum {quote} i think it does not create extra replication task , because the pendingNum count is for selecting in which priority level the block should be added or updated in priority queue > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124263#comment-17124263 ] Hadoop QA commented on HDFS-15375: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 55s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 28s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29398/artifact/out/Dockerfile | | JIRA Issue | HDFS-15375 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13004052/HDFS-15375.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux eba48cf25629 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Person
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124099#comment-17124099 ] Surendra Singh Lilhore commented on HDFS-15375: --- Triggered one build to check the impact of this patch. > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124096#comment-17124096 ] Surendra Singh Lilhore commented on HDFS-15375: --- {quote}- neededReconstruction.update(block, repl.liveReplicas() + pendingNum,{quote} We can't remove {{pendingNum}} from here, it will create extra replication task if this count doesn't include pendingNum. In your case all the block are corrupted means live replica will be zero. You can add some logic based on live replica zero check. > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119811#comment-17119811 ] hemanthboyina commented on HDFS-15375: -- thanks [~surendrasingh] for the comment we have a configuration dfs.namenode.reconstruction.pending.timeout-sec which is by default 5mins , after 5mins the blocks in pending reconstruction will be timedout and will be moved to needed reconstruction by redundancy monitor thread , so now on moving to needed reconstruction the block will be kept on QUEUE_WITH_CORRUPT_BLOCKS and even fsck uses this priority queue to get corrupt blocks by QUEUE_WITH_CORRUPT_BLOCKS , so data mismatch will be happen here too > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119717#comment-17119717 ] Surendra Singh Lilhore commented on HDFS-15375: --- [~hemanthboyina], thanks for patch. one doubt, without this fix how much time it will take to come out from QUEUE_LOW_REDUNDANCY if third replica also corrupted. > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117980#comment-17117980 ] hemanthboyina commented on HDFS-15375: -- ran test failures in local , seems not related org.apache.hadoop.hdfs.TestReconstructStripedFile.testErasureCodingWorkerXmitsWeight org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy.testErasureCodingWorkerXmitsWeight these tests were failing even without this patch , following up on these tests , found they were failing continonusly [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/] [https://builds.apache.org/job/PreCommit-HDFS-Build/29366/|https://builds.apache.org/job/PreCommit-HDFS-Build/29366/#showFailuresLink] [https://builds.apache.org/job/PreCommit-HDFS-Build/29358/] > Reconstruction Work should not happen for Corrupt Block > --- > > Key: HDFS-15375 > URL: https://issues.apache.org/jira/browse/HDFS-15375 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch > > > In BlockManager#updateNeededReconstructions , while updating the > NeededReconstruction we are adding Pendingreconstruction blocks to live > replicas > {code:java} > int pendingNum = pendingReconstruction.getNumReplicas(block); > int curExpectedReplicas = getExpectedRedundancyNum(block); > if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { > neededReconstruction.update(block, repl.liveReplicas() + > pendingNum,{code} > But if two replicas were in pending reconstruction (due to corruption) , and > if the third replica is corrupted the block should be in > QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in > QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a > corrupted block , which is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block
[ https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117035#comment-17117035 ] Hadoop QA commented on HDFS-15375: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 0s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 14s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestStripedFileAppend | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29370/artifact/out/Dockerfile | | JIRA Issue | HDFS-15375 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13004052/HDFS-15375.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2b66a89f40a6 4.15.0-101