[jira] [Comment Edited] (HDFS-16320) Datanode retrieve slownode information from NameNode

Janus Chow (Jira) Sun, 14 Nov 2021 06:14:07 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443351#comment-17443351
 ]


Janus Chow edited comment on HDFS-16320 at 11/14/21, 2:13 PM:
--------------------------------------------------------------

[~hexiaoqiao] 
{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.


was (Author: symious):
{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.

> Datanode retrieve slownode information from NameNode
> ----------------------------------------------------
>
>                 Key: HDFS-16320
>                 URL: https://issues.apache.org/jira/browse/HDFS-16320
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Janus Chow
>            Assignee: Janus Chow
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-16320) Datanode retrieve slownode information from NameNode

Reply via email to