[jira] [Commented] (HDFS-11447) Ozone: SCM: Send node report along with heartbeat to SCM

Anu Engineer (JIRA) Tue, 28 Feb 2017 17:48:15 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889298#comment-15889298
 ]


Anu Engineer commented on HDFS-11447:
-------------------------------------

[~xyao] The changes look excellent. Thank you for the patch. Some minor 
comments.

* nit: Storage Location Report -- Rename ?? -- But I don't have a better 
suggestion.

* ContainerLocationManager.java
Line 109: I see that we have todo to handle failed volume.
Just a suggestion:  Do you want to put the code in the getScmUsed() and 
getAvailable() in a try catch so if it throws, you still can get info about the 
other locations.

* We already have proper shutdown call in StateMachine, would it makes sense 
to save the value in that code insted of adding a shutdown hook. It will 
simpler and all shutdown code will be in one place.
{code}
    // Make the scm usage to be saved during shutdown.
    ShutdownHookManager.get().addShutdownHook(
        new Runnable() {
          @Override
          public void run() {
            if (!scmUsedSaved) {
              saveScmUsed();
            }
          }
        }, SHUTDOWN_HOOK_PRIORITY);
{code}

* {{moveStaleNodeToDead()}} -- We update ScmStats in this function, but not in 
healthyToStaleNodes. Just to make sure that I understand this clearly, does 
this mean that we send traffic to stale nodes ?

* {{handleHeartbeat()}} 
In the earlier code  {{monotonicNow()}} was used since the recvTime could be 
different from when the HB processing thread actually ran. So instead of 
penalizing HB wait time in the queue, we just update the last HB time as the 
time we saw it. That was a conscious decision, if we start updating  the 
recvTime, then we also need to make sure that our response time is within a 
reasonable time frame. That is we need to guarantee the HB's will get processed 
within a window of time.
*Line 522:*  {{long lastTimestamp = hbItem.getRecvTimestamp();}}
*Line 528:*  {{healthyNodes.put(datanodeID, lastTimestamp);}}
if we update the code to use monotonicNow() we avoid this problem of packet 
starvation due to queue and HB processing thread. See the comments in *line 
431* about how HB thread assumes that no issues comes from not being able to 
run in real time.
In fact, I suggest that we use the {{hbItem.getRecvTimestamp()}} and current 
time difference as a metric which lets us know the average time we queue a HB 
packet. It is very useful metric to have.

* {{handleHeartbeat()}} -- Do we need to call this function 3 times inside 3 
separate if statements or can we move it outside once ? 
{{updateNodeStat(datanodeID, nodeReport);}}

> Ozone: SCM: Send node report along with heartbeat to SCM
> --------------------------------------------------------
>
>                 Key: HDFS-11447
>                 URL: https://issues.apache.org/jira/browse/HDFS-11447
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-11447-HDFS-7240.001.patch
>
>
> The storage utilization information on datanode should be reported to SCM to 
> help decide container allocation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11447) Ozone: SCM: Send node report along with heartbeat to SCM

Reply via email to