[ https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183810#comment-15183810 ]
Jing Zhao commented on HDFS-8786: --------------------------------- Thanks for updating the patch, [~rakeshr]! The 004 patch looks pretty good to me. Some minors: # Not all the decommissioning nodes can be later used as source nodes, since we still need to consider DataNode's current load etc. Thus I'm not sure the calculation is correct here. In the meanwhile, I do not think we should adjust additionalReplRequired: there is no need to leave decommissioning nodes to the next round. Thus looks like we do not need this change. {code} // should reconstruct all the internal blocks before scheduling // replication task for decommissioning node(s). if (additionalReplRequired - numReplicas.decommissioning() > 0) { additionalReplRequired = additionalReplRequired - numReplicas.decommissioning(); } {code} # We actually need to track if the reconstruction work is triggered only by "not enough rack". For normal EC reconstruction work maybe we also do not have enough racks, but we will keep {{enoughRack}} as true. Thus using a boolean "notEnoughRack" may be more accurate. {code} private boolean enoughRack = true; {code} # {{DatanodeManager#sortLocatedStripedBlocks}} can be added in a separate jira. We can also add some new tests for this change. # In ErasureCodingWork, We can put the not-enough-rack logic and the decommissioning logic into two separate methods. Also if target size is smaller than sources, we do not need to create the block in {{addTaskToDatanode}}. Maybe the code after change can look like this: {code} private void createReplicationWork(int sourceIndex, DatanodeStorageInfo target) { BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock(); final byte blockIndex = liveBlockIndicies[sourceIndex]; final DatanodeDescriptor source = getSrcNodes()[sourceIndex]; final long internBlkLen = StripedBlockUtil.getInternalBlockLength( stripedBlk.getNumBytes(), stripedBlk.getCellSize(), stripedBlk.getDataBlockNum(), blockIndex); final Block targetBlk = new Block( stripedBlk.getBlockId() + blockIndex, internBlkLen, stripedBlk.getGenerationStamp()); source.addBlockToBeReplicated(targetBlk, new DatanodeStorageInfo[]{target}); if (BlockManager.LOG.isDebugEnabled()) { BlockManager.LOG.debug("Add replication task from source {} to " + "target {} for EC block {}", source, target, targetBlk); } } private List<Integer> findDecommissioningSources() { List<Integer> srcIndices = new ArrayList<>(); for (int i = 0; i < getSrcNodes().length; i++) { if (getSrcNodes()[i].isDecommissionInProgress()) { srcIndices.add(i); } } return srcIndices; } @Override void addTaskToDatanode(NumberReplicas numberReplicas) { final DatanodeStorageInfo[] targets = getTargets(); assert targets.length > 0; BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock(); if (hasNotEnoughRack()) { // if we already have all the internal blocks, but not enough racks, // we only need to replicate one internal block to a new rack int sourceIndex = chooseSource4SimpleReplication(); createReplicationWork(sourceIndex, targets[0]); } else if (numberReplicas.decommissioning() > 0 && hasAllInternalBlocks()) { List<Integer> decommissioningSources = findDecommissioningSources(); // decommissioningSources.size() should be >= targets.length final int num = Math.min(decommissioningSources.size(), targets.length); for (int i = 0; i < num; i++) { createReplicationWork(decommissioningSources.get(i), targets[i]); } } else { targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded( new ExtendedBlock(blockPoolId, stripedBlk), getSrcNodes(), targets, getLiveBlockIndicies(), stripedBlk.getErasureCodingPolicy()); } } {code} > Erasure coding: DataNode should transfer striped blocks before being > decommissioned > ----------------------------------------------------------------------------------- > > Key: HDFS-8786 > URL: https://issues.apache.org/jira/browse/HDFS-8786 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Zhe Zhang > Assignee: Rakesh R > Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, > HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-draft.patch > > > Per [discussion | > https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] > under HDFS-8697, it's too expensive to reconstruct block groups for decomm > purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)