[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084738#comment-14084738 ] Hudson commented on HDFS-5723: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1852 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1852/]) HDFS-5723. Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction. Contributed by Vinayakumar B. (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615491) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084664#comment-14084664 ] Hudson commented on HDFS-5723: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1827 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1827/]) HDFS-5723. Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction. Contributed by Vinayakumar B. (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615491) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084548#comment-14084548 ] Hudson commented on HDFS-5723: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #633 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/633/]) HDFS-5723. Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction. Contributed by Vinayakumar B. (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615491) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084375#comment-14084375 ] Hudson commented on HDFS-5723: -- FAILURE: Integrated in Hadoop-trunk-Commit #6005 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6005/]) HDFS-5723. Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction. Contributed by Vinayakumar B. (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615491) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084371#comment-14084371 ] Vinayakumar B commented on HDFS-5723: - Committed to trunk and branch-2. Thanks [~umamaheswararao], [~jingzhao] and [~stanley_shi] > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084364#comment-14084364 ] Vinayakumar B commented on HDFS-5723: - Thanks Uma for the review. I will commit this soon > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071910#comment-14071910 ] Vinayakumar B commented on HDFS-5723: - Thanks [~umamaheswararao] for the explanation above and for reviewing the patch. I am very confident that Test failure is not related to this patch. Its observed in many of the QA builds today. Need to do an investigation of that. I will verify it and create a ticket if any issue observed. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071846#comment-14071846 ] Uma Maheswara Rao G commented on HDFS-5723: --- +1 latest patch looks good to me. Could please comment about the failure above? > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071844#comment-14071844 ] Uma Maheswara Rao G commented on HDFS-5723: --- For the above question which Jing asked should work fine as Vinay answered it. When DN reports with older genstamp, NN will add it to corruptReplicMap and NN ask DN to remove block with older genstamp only when it processes corrupt replicas. DN can not remove it as DN already bumped with genstamp. on next report from DN with newer genstamp, NN will remove from corruptMap if it exist. {code} } else { // if the same block is added again and the replica was corrupt // previously because of a wrong gen stamp, remove it from the // corrupt block list. corruptReplicas.removeFromCorruptReplicasMap(block, node, Reason.GENSTAMP_MISMATCH); curReplicaDelta = 0; blockLog.warn("BLOCK* addStoredBlock: " + "Redundant addStoredBlock request received for " + storedBlock + " on " + node + " size " + storedBlock.getNumBytes()); } {code} > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071630#comment-14071630 ] Hadoop QA commented on HDFS-5723: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657310/HDFS-5723.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7440//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7440//console This message is automatically generated. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026152#comment-14026152 ] Vinayakumar B commented on HDFS-5723: - Hi Stanley, If the issue is different, then you can file a separate Jira for that. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026090#comment-14026090 ] stanley shi commented on HDFS-5723: --- Hi Vinay, Seems my steps are an different error but with the same error log. Do you want to fix it in this ticket or prefer me to submit another ticket? > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019741#comment-14019741 ] stanley shi commented on HDFS-5723: --- not always occur, but at least 20-30 percent occurence in my environment; > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019738#comment-14019738 ] stanley shi commented on HDFS-5723: --- Hi Vinay, there're still some bug with this patch; I just tested this patch in my environment, here's the steps: environment: HA cluster with 4 datanodes; 1. put one file (one block) to hdfs with repl=3; 2. use "fsck" to check which datanode has the blocks; and also which datanode don't have it (the "free node") 3. close the first datanode that has the block(first one shown in the fsck command); 4. append content to the file 100 times; 5. close the "free node" and restart the "first datanode" 6. append content to the file 100 times again; check the datanodes, this error message still occurs; > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969320#comment-13969320 ] Vinayakumar B commented on HDFS-5723: - Hi, Can someone take look at updated patch? Thanks in advance > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923719#comment-13923719 ] Hadoop QA commented on HDFS-5723: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633322/HDFS-5723.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6336//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6336//console This message is automatically generated. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch, HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911314#comment-13911314 ] Vinayakumar B commented on HDFS-5723: - Any more update required on this Jira? > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880721#comment-13880721 ] Vinay commented on HDFS-5723: - bq. Hi Vinay, one question about the patch: so this inconsistent generation stamp can also be caused by a delayed block-received report? I.e., after the first close(), the DN's report gets delayed and is received by NN when the append starts. In that case, will we have any issue by wrongly putting the (block, DN) into the corruptBlockMap? It can happen. Earlier block with prev genstamp will be marked corrupt, if append pipeline setup with genstamp update happened before previous report comes. But append pipeline creation will update the genstamp in that datanode also, so one more block-received report is expectedn with correct genstamp. This time it will remove the old block from corrupt replica map. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880525#comment-13880525 ] Jing Zhao commented on HDFS-5723: - Hi Vinay, one question about the patch: so this inconsistent generation stamp can also be caused by a delayed block-received report? I.e., after the first close(), the DN's report gets delayed and is received by NN when the append starts. In that case, will we have any issue by wrongly putting the (block, DN) into the corruptBlockMap? > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865487#comment-13865487 ] Hadoop QA commented on HDFS-5723: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621965/HDFS-5723.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5845//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5845//console This message is automatically generated. > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5723.patch > > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863898#comment-13863898 ] Vinay commented on HDFS-5723: - {noformat}2014-01-07 09:47:22,878 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: :25012:DataXceiver error processing WRITE_BLOCK operation src: /:56873 dest: /:25012 org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-1746676845--1388725564463:blk_1073742062_1268 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:372) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:507) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:93) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:457) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662){noformat} {noformat}2014-01-07 09:47:46,773 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: :25012:DataXceiver error processing READ_BLOCK operation src: /:35125 dest: /:25012 java.io.IOException: Replica gen stamp < block genstamp, block=BP-1746676845--1388725564463:blk_1073742062_1270, replica=FinalizedReplica, blk_1073742062_1266, FINALIZED getNumBytes() = 6 getBytesOnDisk() = 6 getVisibleLength()= 6 getVolume() = /home/vinay/hadoop/dfs/data/current getBlockFile()= /home/vinay/hadoop/dfs/data/current/BP-1746676845--1388725564463/current/finalized/blk_1073742062 unlinked =false at org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:247) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:328) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662){noformat} > Append failed FINALIZED replica should not be accepted as valid when that > block is underconstruction > > > Key: HDFS-5723 > URL: https://issues.apache.org/jira/browse/HDFS-5723 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Vinay > > Scenario: > 1. 3 node cluster with > dfs.client.block.write.replace-datanode-on-failure.enable set to false. > 2. One file is written with 3 replicas, blk_id_gs1 > 3. One of the datanode DN1 is down. > 4. File was opened with append and some more data is added to the file and > synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 > 5. Now DN1 restarted > 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should > be made marked corrupted. > but since NN having appended block state as UnderConstruction, at this time > its not detecting this block as corrupt and adding to valid block locations. > As long as the namenode is alive, this datanode also will be considered as > valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)