[ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249133#comment-15249133
 ] 

Kai Zheng commented on HDFS-8449:
---------------------------------

Some comments. Could you help check? Thanks Bo.
1. In {{ErasureCodingWorker}}, ref. the following change, it doesn't look good 
to put the counters here, because it only means the task is submitted 
successfully or not, regardless of the task being actually executed 
successfully or not. The right place would be in the {{run()}} method in the 
{{Runnable StripedReconstructor}} task. We may not worry too much about tasks 
of invalid targets because such tasks should be avoided in NN side eventually.
{code}
  public void processErasureCodingTasks(
      Collection<BlockECReconstructionInfo> ecTasks) {
    for (BlockECReconstructionInfo reconstructionInfo : ecTasks) {
      try {
        final StripedReconstructor task =
            new StripedReconstructor(this, reconstructionInfo);
        if (task.hasValidTargets()) {
          stripedReconstructionPool.submit(task);
+          datanode.getMetrics().incrECReconstructionTasks();
        } else {
          LOG.warn("No missing internal block. Skip reconstruction for task:{}",
              reconstructionInfo);
        }
      } catch (Throwable e) {
        LOG.warn("Failed to reconstruct striped block {}",
            reconstructionInfo.getExtendedBlock().getLocalBlock(), e);
+        datanode.getMetrics().incrECFailedReconstructionTasks();
      }
    }
  }
{code}

2. It's good to see new tests for this. As {{TestReconstructStripedFile}} has 
implemented all sorts of cases that reconstruction tasks can happen, could we 
improve it and add the metrics related checks in it?

> Add tasks count metrics to datanode for ECWorker
> ------------------------------------------------
>
>                 Key: HDFS-8449
>                 URL: https://issues.apache.org/jira/browse/HDFS-8449
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, 
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done, 
> including total tasks, failed tasks and sucessful tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to