[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-06-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124998#comment-17124998
 ] 

hemanthboyina commented on HDFS-15375:
--

test failures were not related
{quote}We can't remove {{pendingNum}} from here, it will create extra 
replication task if this count doesn't include pendingNum
{quote}
i think it does not create extra replication task , because the pendingNum 
count is for selecting in which priority level the block should be added or 
updated in priority queue 

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-06-02 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124263#comment-17124263
 ] 

Hadoop QA commented on HDFS-15375:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
55s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 28s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29398/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15375 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004052/HDFS-15375.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux eba48cf25629 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Person

[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-06-02 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124099#comment-17124099
 ] 

Surendra Singh Lilhore commented on HDFS-15375:
---

Triggered one build to check the impact of this patch. 

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-06-02 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124096#comment-17124096
 ] 

Surendra Singh Lilhore commented on HDFS-15375:
---

{quote}-                 neededReconstruction.update(block, repl.liveReplicas() 
+ pendingNum,{quote}
We can't remove {{pendingNum}} from here, it will create extra replication task 
if this count doesn't include pendingNum. In your case all the block are 
corrupted means live replica will be zero. You can add some logic based on live 
replica zero check.

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119811#comment-17119811
 ] 

hemanthboyina commented on HDFS-15375:
--

thanks [~surendrasingh] for the comment

we have a configuration dfs.namenode.reconstruction.pending.timeout-sec which 
is by default 5mins , after 5mins the blocks in pending reconstruction will be 
timedout  and will be moved to needed reconstruction  by redundancy monitor 
thread , so now on moving to needed reconstruction the block will be kept on 
QUEUE_WITH_CORRUPT_BLOCKS

and even fsck uses this priority queue to get corrupt blocks by 
QUEUE_WITH_CORRUPT_BLOCKS , so data mismatch will be happen here too

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-29 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119717#comment-17119717
 ] 

Surendra Singh Lilhore commented on HDFS-15375:
---

[~hemanthboyina], thanks for patch.

one doubt, without this fix how much time it will take to come out from 
QUEUE_LOW_REDUNDANCY if third replica also corrupted. 

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-27 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117980#comment-17117980
 ] 

hemanthboyina commented on HDFS-15375:
--

ran test failures in local , seems not related

 

org.apache.hadoop.hdfs.TestReconstructStripedFile.testErasureCodingWorkerXmitsWeight
                          
org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy.testErasureCodingWorkerXmitsWeight

these tests were failing even without this patch , following up on these tests 
, found they were failing continonusly

[https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]

[https://builds.apache.org/job/PreCommit-HDFS-Build/29366/|https://builds.apache.org/job/PreCommit-HDFS-Build/29366/#showFailuresLink]

[https://builds.apache.org/job/PreCommit-HDFS-Build/29358/]

  

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117035#comment-17117035
 ] 

Hadoop QA commented on HDFS-15375:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
0s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 14s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestStripedFileAppend |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29370/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15375 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004052/HDFS-15375.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 2b66a89f40a6 4.15.0-101