[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289934#comment-17289934 ] Hadoop QA commented on HDFS-15422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 16s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 13s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 26s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289872#comment-17289872 ] Stephen O'Donnell commented on HDFS-15422: -- Thanks for the review [~weichiu]. I committed this to trunk and the cherry-pick was clean down to 3.1. I have pushed a new identical 2.10 patch to re-trigger Jenkins. If that comes back clean I will commit there too. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15422-branch-2.10.001.patch, > HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289615#comment-17289615 ] Wei-Chiu Chuang commented on HDFS-15422: I think the patch looks reasonable. I am +1. bq. // TODO: Pretty confident this should be s/storedBlock/block below, I had stared at this line years before and wondered why it was not updated :) > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289016#comment-17289016 ] Stephen O'Donnell commented on HDFS-15422: -- [~kihwal] Have you been running the change you suggested on your internal clusters, and have you noticed any problems, or better that it fixed the issue? > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287710#comment-17287710 ] Stephen O'Donnell commented on HDFS-15422: -- The test TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks failing gave me some concerns, so I ran it 6 times locally on my branch and 6 times locally on trunk without this change. In both cases it passed 4 times and timed out 2, so it looks like the test is somewhat flaky. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287377#comment-17287377 ] Hadoop QA commented on HDFS-15422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 53s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 34s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 2s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 36s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287194#comment-17287194 ] Stephen O'Donnell commented on HDFS-15422: -- We have had a report of something that sounds very like this problem. Appended blocks, failover and the blocks are marked corrupt. They are still readable etc. I suspect if we fail back over and restart the new SBNN it will clear it, but I am waiting to confirm that. This is on Cloudera CDP 7.x, which is a heavily patched 3.1 build. It looks like this problem is still there on trunk. From earlier comments, it sounds like a unit test for this is very difficult, so I will post a trunk patch with the small change [~kihwal] suggested and see what Yetus says. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240491#comment-17240491 ] Ravuri Sushma sree commented on HDFS-15422: --- Thank you everyone for the discussion here. Can anyone let me know if this issue is reproducible in UT? > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191255#comment-17191255 ] Masatake Iwasaki commented on HDFS-15422: - I set the target version to 2.10.2. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191254#comment-17191254 ] Masatake Iwasaki commented on HDFS-15422: - I could not make a test case which fails without the fix. If BlockManager#checkReplicaCorrupt returns BlockToMarkCorrupt in the conditional below, the issue reported might be reproduced but the condition could not be met by just tweaking the timing of edit log replay in the standby and block reports. {code:java} switch(reportedState) { case FINALIZED: switch(ucState) { case COMPLETE: case COMMITTED: if (storedBlock.getGenerationStamp() != reported.getGenerationStamp()) { final long reportedGS = reported.getGenerationStamp(); return new BlockToMarkCorrupt(storedBlock, reportedGS, "block is " + ucState + " and reported genstamp " + reportedGS + " does not match genstamp in block map " + storedBlock.getGenerationStamp(), Reason.GENSTAMP_MISMATCH); } else if (storedBlock.getNumBytes() != reported.getNumBytes()) { return new BlockToMarkCorrupt(storedBlock, "block is " + ucState + " and reported length " + reported.getNumBytes() + " does not match " + "length in block map " + storedBlock.getNumBytes(), Reason.SIZE_MISMATCH); {code} > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191048#comment-17191048 ] Masatake Iwasaki commented on HDFS-15422: - The code was introduced by HDFS-2742 (in HDFS-1623 branch). I think the fix makes sense. I did not see relevant test failure on both QA build and my local. I'm trying to add test case (based on TestDNFencing#testQueueingWithAppend) for SIZE_MISMATCH after append mentioned by [~kihwal]. If I can not make it in a few days, I will update the target version to 2.10.2. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190958#comment-17190958 ] Hadoop QA commented on HDFS-15422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 59s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 39s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}116m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | |
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190943#comment-17190943 ] Masatake Iwasaki commented on HDFS-15422: - I'm looking into the code while attaching the diff to kick unit tests. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-15422-branch-2.10.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180482#comment-17180482 ] Fei Hui commented on HDFS-15422: [~kihwal] Thanks for reporting and the fix. Can we push this fix to trunk? > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140618#comment-17140618 ] Kihwal Lee commented on HDFS-15422: --- The fix is simple. {code} @@ -2578,10 +2578,7 @@ private BlockInfo processReportedBlock( // If the block is an out-of-date generation stamp or state, // but we're the standby, we shouldn't treat it as corrupt, // but instead just queue it for later processing. -// TODO: Pretty confident this should be s/storedBlock/block below, -// since we should be postponing the info of the reported block, not -// the stored block. See HDFS-6289 for more context. -queueReportedBlock(storageInfo, storedBlock, reportedState, +queueReportedBlock(storageInfo, block, reportedState, QUEUE_REASON_CORRUPT_STATE); } else { toCorrupt.add(c); {code} If the old information in memory({{storedBlock}}) is used in queueing a report, the size may be old. Unlike GENSTAMP_MISMATCH, this kind of corruption can be undone when the NN sees a correct report again. I.e. forcing a block report won't fix this condition. > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Priority: Critical > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org