[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183810#comment-15183810
 ] 

Jing Zhao commented on HDFS-8786:
---------------------------------

Thanks for updating the patch, [~rakeshr]! The 004 patch looks pretty good to 
me. Some minors:
# Not all the decommissioning nodes can be later used as source nodes, since we 
still need to consider DataNode's current load etc. Thus I'm not sure the 
calculation is correct here. In the meanwhile, I do not think we should adjust 
additionalReplRequired: there is no need to leave decommissioning nodes to the 
next round. Thus looks like we do not need this change.
{code}
      // should reconstruct all the internal blocks before scheduling
      // replication task for decommissioning node(s).
      if (additionalReplRequired - numReplicas.decommissioning() > 0) {
        additionalReplRequired = additionalReplRequired
            - numReplicas.decommissioning();
      }
{code}
# We actually need to track if the reconstruction work is triggered only by 
"not enough rack". For normal EC reconstruction work maybe we also do not have 
enough racks, but we will keep {{enoughRack}} as true. Thus using a boolean 
"notEnoughRack" may be more accurate.
{code}
private boolean enoughRack = true;
{code}
# {{DatanodeManager#sortLocatedStripedBlocks}} can be added in a separate jira. 
We can also add some new tests for this change.
# In ErasureCodingWork, We can put the not-enough-rack logic and the 
decommissioning logic into two separate methods. Also if target size is smaller 
than sources, we do not need to create the block in {{addTaskToDatanode}}. 
Maybe the code after change can look like this:
{code}
  private void createReplicationWork(int sourceIndex, DatanodeStorageInfo 
target) {
    BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock();
    final byte blockIndex = liveBlockIndicies[sourceIndex];
    final DatanodeDescriptor source = getSrcNodes()[sourceIndex];
    final long internBlkLen = StripedBlockUtil.getInternalBlockLength(
        stripedBlk.getNumBytes(), stripedBlk.getCellSize(),
        stripedBlk.getDataBlockNum(), blockIndex);
    final Block targetBlk = new Block(
        stripedBlk.getBlockId() + blockIndex, internBlkLen,
        stripedBlk.getGenerationStamp());
    source.addBlockToBeReplicated(targetBlk, new DatanodeStorageInfo[]{target});
    if (BlockManager.LOG.isDebugEnabled()) {
      BlockManager.LOG.debug("Add replication task from source {} to "
          + "target {} for EC block {}", source, target, targetBlk);
    }
  }

  private List<Integer> findDecommissioningSources() {
    List<Integer> srcIndices = new ArrayList<>();
    for (int i = 0; i < getSrcNodes().length; i++) {
      if (getSrcNodes()[i].isDecommissionInProgress()) {
        srcIndices.add(i);
      }
    }
    return srcIndices;
  }

  @Override
  void addTaskToDatanode(NumberReplicas numberReplicas) {
    final DatanodeStorageInfo[] targets = getTargets();
    assert targets.length > 0;
    BlockInfoStriped stripedBlk = (BlockInfoStriped) getBlock();

    if (hasNotEnoughRack()) {
      // if we already have all the internal blocks, but not enough racks,
      // we only need to replicate one internal block to a new rack
      int sourceIndex = chooseSource4SimpleReplication();
      createReplicationWork(sourceIndex, targets[0]);
    } else if (numberReplicas.decommissioning() > 0 && hasAllInternalBlocks()) {
      List<Integer> decommissioningSources = findDecommissioningSources();
      // decommissioningSources.size() should be >= targets.length
      final int num = Math.min(decommissioningSources.size(), targets.length);
      for (int i = 0; i < num; i++) {
        createReplicationWork(decommissioningSources.get(i), targets[i]);
      }
    } else {
      targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded(
          new ExtendedBlock(blockPoolId, stripedBlk),
          getSrcNodes(), targets, getLiveBlockIndicies(),
          stripedBlk.getErasureCodingPolicy());
    }
  }
{code}

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-8786
>                 URL: https://issues.apache.org/jira/browse/HDFS-8786
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Rakesh R
>         Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to