[ https://issues.apache.org/jira/browse/HDFS-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889298#comment-15889298 ]
Anu Engineer commented on HDFS-11447: ------------------------------------- [~xyao] The changes look excellent. Thank you for the patch. Some minor comments. * nit: Storage Location Report -- Rename ?? -- But I don't have a better suggestion. * ContainerLocationManager.java Line 109: I see that we have todo to handle failed volume. Just a suggestion: Do you want to put the code in the getScmUsed() and getAvailable() in a try catch so if it throws, you still can get info about the other locations. * We already have proper shutdown call in StateMachine, would it makes sense to save the value in that code insted of adding a shutdown hook. It will simpler and all shutdown code will be in one place. {code} // Make the scm usage to be saved during shutdown. ShutdownHookManager.get().addShutdownHook( new Runnable() { @Override public void run() { if (!scmUsedSaved) { saveScmUsed(); } } }, SHUTDOWN_HOOK_PRIORITY); {code} * {{moveStaleNodeToDead()}} -- We update ScmStats in this function, but not in healthyToStaleNodes. Just to make sure that I understand this clearly, does this mean that we send traffic to stale nodes ? * {{handleHeartbeat()}} In the earlier code {{monotonicNow()}} was used since the recvTime could be different from when the HB processing thread actually ran. So instead of penalizing HB wait time in the queue, we just update the last HB time as the time we saw it. That was a conscious decision, if we start updating the recvTime, then we also need to make sure that our response time is within a reasonable time frame. That is we need to guarantee the HB's will get processed within a window of time. *Line 522:* {{long lastTimestamp = hbItem.getRecvTimestamp();}} *Line 528:* {{healthyNodes.put(datanodeID, lastTimestamp);}} if we update the code to use monotonicNow() we avoid this problem of packet starvation due to queue and HB processing thread. See the comments in *line 431* about how HB thread assumes that no issues comes from not being able to run in real time. In fact, I suggest that we use the {{hbItem.getRecvTimestamp()}} and current time difference as a metric which lets us know the average time we queue a HB packet. It is very useful metric to have. * {{handleHeartbeat()}} -- Do we need to call this function 3 times inside 3 separate if statements or can we move it outside once ? {{updateNodeStat(datanodeID, nodeReport);}} > Ozone: SCM: Send node report along with heartbeat to SCM > -------------------------------------------------------- > > Key: HDFS-11447 > URL: https://issues.apache.org/jira/browse/HDFS-11447 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Xiaoyu Yao > Assignee: Xiaoyu Yao > Attachments: HDFS-11447-HDFS-7240.001.patch > > > The storage utilization information on datanode should be reported to SCM to > help decide container allocation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org