[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186145#comment-15186145
 ] 

Kai Zheng commented on HDFS-9822:
---------------------------------

Thanks [~zhzhoubin].
bq. I don't think we should schedule all EC tasks in the same queue.
Yeah I agree. It's my confusing and I can be clear. What I meant is, striped 
blocks can be tracked in separate set of queues dedicated to striping files in 
the unit of block group, instead of internal blocks in groups. So if a group 
loses 3 internal blocks, then only one entry instead of 3 are maintained in the 
queue(s).
bq. If a block group has lost 3 internal blocks, we should treat it with higher 
priority than one that has lost 1.
That's right. So when new internal block is reported bad, then the existing 
entry for the block group will merge this one, and will not create new entry, 
right. I think in this way it can basically avoid having multiple 
reconstruction tasks to be generated.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9822
>                 URL: https://issues.apache.org/jira/browse/HDFS-9822
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Rakesh R
>         Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>       at java.lang.Thread.run(Thread.java:745)
>       at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>       at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to