[ https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186145#comment-15186145 ]
Kai Zheng commented on HDFS-9822: --------------------------------- Thanks [~zhzhoubin]. bq. I don't think we should schedule all EC tasks in the same queue. Yeah I agree. It's my confusing and I can be clear. What I meant is, striped blocks can be tracked in separate set of queues dedicated to striping files in the unit of block group, instead of internal blocks in groups. So if a group loses 3 internal blocks, then only one entry instead of 3 are maintained in the queue(s). bq. If a block group has lost 3 internal blocks, we should treat it with higher priority than one that has lost 1. That's right. So when new internal block is reported bad, then the existing entry for the block group will merge this one, and will not create new entry, right. I think in this way it can basically avoid having multiple reconstruction tasks to be generated. > Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped > block at the same time > ---------------------------------------------------------------------------------------------------- > > Key: HDFS-9822 > URL: https://issues.apache.org/jira/browse/HDFS-9822 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding > Reporter: Tsz Wo Nicholas Sze > Assignee: Rakesh R > Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch > > > Found the following AssertionError in > https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/ > {code} > AssertionError: Should wait the previous reconstruction to finish > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100) > at java.lang.Thread.run(Thread.java:745) > at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) > at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)