[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-1172: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: (was: 3.0.0, 2.1.0-beta, 1.3.0) Status: Resolved (was: Patch Available) I've committed this into trunk and branch-2. Thanks [~iwasakims] for continuing and finishing the work! > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.012.patch, HDFS-1172.013.patch, HDFS-1172.014.patch, > HDFS-1172.014.patch, HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.014.patch attaching the same file again to kick jenkins. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.012.patch, HDFS-1172.013.patch, HDFS-1172.014.patch, > HDFS-1172.014.patch, HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.014.patch bq. Only nit is we can change the following if condition to if (b && !lastBlock.isStriped()) to make sure we do not put duplicated records into the pending queue. Sure. I attached 014. Tests failed in QA build succeeded on my environment. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.012.patch, HDFS-1172.013.patch, HDFS-1172.014.patch, > HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.013.patch Thanks, [~jingzhao]! Your comment makes sense. I attached 013. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.012.patch, HDFS-1172.013.patch, HDFS-1172.patch, hdfs-1172.txt, > hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.012.patch I update the patch. * addressed the failure of TestRecoverStripedFile: fixed to avoid updating pendingReplications if file is striped. * added calling to {{DataNodeTestUtils#triggerHeartbeat}} in order to make sure {{TestReplication#testNoExtraReplicationWhenBlockReceivedIsLate}} fails without the fix of BlockManager. * fixed checkstyle warning except for file length. * fixed whitespace error. * release audit is not related to the fix. * failure of TestBlockReport and TestCheckpoint is not related to the code path of the patch. I could not reproduce the failure on my env. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.012.patch, HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.011.patch I attached updated patch as 011. * pendingReplications is updated only before file completeion. * refactored test code in TestReplication, using mockito rather than adding test code to BPOfferService. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.011.patch, > HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-1172: Assignee: Masatake Iwasaki > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Masatake Iwasaki > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.patch, hdfs-1172.txt, > hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.010.patch bq. Also if a block's effective replica number (including pending replica number) is >= than its replication factor, the block should not be in neededReplication. I rethinked about this and fixed {{checkReplication}} accordingly. I also fixed to address checkstyle warnings. Warning about file length of BlockManager.java is not introduced here. The failure of {{TestBlockManager.testBlocksAreNotUnderreplicatedInSingleRack}} seems not to be related to the patch and I could not reproduce it in my environment. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.patch, hdfs-1172.txt, > hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.009.patch I attached updated patch as 009. * fixed intermittent failures of {{TestReplication#testNoExtraReplicationWhenBlockReceivedIsLate}} ** Changed the number of DataNodes of MiniDFSCluster to 3. ** Set {{blockReceivedDelayForTestsSetting}} for only 1 DataNode to get enough time window in which file is completed but at lease one of the replicas is not reported. ** Got rid of randomizing sleep time in {{BPServiceActor#delayBeforeBlockReceivedForTests}}. * It turned out that some other JIRAs added fixes needed here. ** {{BlockManager#hasEnoughEffectiveReplicas}} added by HDFS-8938 takes pending replicas into account. ** {{numCurrentReplica}} in {{BlockManager#addStoredBlock}} was fixed to take pending replicas into account by HDFS-8623. * Test failures below are not related and already filed/fixed. ** TestLazyWriter: HDFS-9067 ** TestLazyPersistLockedMemory: HDFS-9073 ** TestJMXGet: HDFS-9072 ** TestLazyPersistReplicaPlacement: HDFS-9074 > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.009.patch, HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Target Version/s: 2.1.0-beta, 3.0.0, 1.3.0 (was: 3.0.0, 2.1.0-beta, 1.3.0) Status: Patch Available (was: Open) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-1172: --- Attachment: HDFS-1172.008.patch I rebased the patch on current trunk and attached as HDFS-1172.008.patch. * I added calling {{BlockManagerTestUtil#computeAllPendingWork}} in {{TestReplication#pendingReplicationCount}} to make sure that replication is scheduled. This is needed for the {{TestReplication#testNoExtraReplicationWhenBlockReceivedIsLate}} to fail without the fix of BlockManager. * {{TestReplication#testReplicationWhenBlockCorruption}} succeeds without the fix of BlockManager but I left it in the patch because there is no equivalent test in the TestReplication. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, > HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-1172: Attachment: HDFS-1172-150907.patch rebase against trunk. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172-150907.patch, HDFS-1172.patch, hdfs-1172.txt, > hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-1172: --- Fix Version/s: (was: 0.24.0) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1172: - Target Version/s: 3.0.0, 2.0.5-beta, 1.3.0 (was: 1.2.0, 3.0.0, 2.0.5-beta) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Fix For: 0.24.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch, replicateBlocksFUC.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1172: -- Assignee: (was: Todd Lipcon) Hey folks. Sorry I let this one drop off my radar for a couple years :) I don't think I'll have time to work on it in the coming months, so if you want to take it over, go ahead. I think the remaining issue was that the test coverage is still a little weak (and will probably need significant rebasing for 2.x/3.x) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Fix For: 0.24.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch, replicateBlocksFUC.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-1172: -- Target Version/s: 1.2.0, 3.0.0, 2.0.5-beta > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.24.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch, replicateBlocksFUC.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-1172: -- Attachment: hdfs-1172.txt Updated patch rebased on trunk. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.24.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1172: -- Status: Open (was: Patch Available) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.22.0, 0.23.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1172: -- Fix Version/s: 0.22.0 > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.22.0, 0.23.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1172: -- Attachment: hdfs-1172.txt Here's a new patch against trunk for this issue. A few things changed since Hairong's original patch: - I removed the part of the test that changes the replication factor of a file while it's under construction. This part of the test wasn't succeeding reliably, since it was running into a different bug: HDFS-2283 - added the test code from HDFS-1197 which allows the DNs to artificially delay blockReceived calls in the tests. This exposed some other bugs with the patch - the new replicateLastBlock code needed to be called in a different place: -- the original patch called this on every attempt of completeFile(), rather than on only the final/successful attempt. This meant that, if the replicas were very slow to check in, the targets would be added to pendingReplication many times, yielding a pending replica count much larger than the actual replication factor -- the code needs to be called for all blocks, not just the last block in a file I looped the new tests for a while and they pass reliably. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: HDFS-1172.patch, hdfs-1172.txt, > replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Attachment: replicateBlocksFUC1.patch Resubmitting this to trigger hudson. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Fix For: 0.23.0 > > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch, replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Status: Patch Available (was: Open) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Fix For: 0.23.0 > > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Fix Version/s: 0.23.0 Status: Open (was: Patch Available) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Fix For: 0.23.0 > > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Status: Patch Available (was: Open) > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Attachment: replicateBlocksFUC1.patch This patch 1. makes sure that blocks in a newly-closed file does not get over-replicated; 2. makes sure that blocks except for the last block in a file under-construction get replicated when under-replicated; This will allow a decommissioning datanode to finish decommissioning even it has replicas in files under construction. 3. adds a unit test. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch, > replicateBlocksFUC1.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1172: Attachment: replicateBlocksFUC.patch An initial patch for review. Will add a unit test and do more testing. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Hairong Kuang > Attachments: HDFS-1172.patch, replicateBlocksFUC.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
[ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1172: - Attachment: HDFS-1172.patch Does this patch looks like what has been discussed here? It puts underreplicated blocks into pending replication queue in case of newly created file. > Blocks in newly completed files are considered under-replicated too quickly > --- > > Key: HDFS-1172 > URL: https://issues.apache.org/jira/browse/HDFS-1172 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Todd Lipcon > Attachments: HDFS-1172.patch > > > I've seen this for a long time, and imagine it's a known issue, but couldn't > find an existing JIRA. It often happens that we see the NN schedule > replication on the last block of files very quickly after they're completed, > before the other DNs in the pipeline have a chance to report the new block. > This results in a lot of extra replication work on the cluster, as we > replicate the block and then end up with multiple excess replicas which are > very quickly deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira