[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443351#comment-17443351
 ] 

Janus Chow commented on HDFS-16320:
---

{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443283#comment-17443283
 ] 

Xiaoqiao He commented on HDFS-16320:


Thanks [~Symious] for your quick response. It is reasonable case for me. I mean 
that DataNode has the total information to decide if he is SLOW based on 
response time or throughput rather than based on command from NameNode. 
Furthermore there is possible to false positive at NameNode side.
I am not against the idea but we should have more proper way to solve this 
problem. IMO, client and DataNode/Pipeline communication could estimate if 
there are slow nodes and which one is slow. FYI. Thanks.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443276#comment-17443276
 ] 

Janus Chow commented on HDFS-16320:
---

[~hexiaoqiao]  Thank you for the review.

The issue we met is we have clients writing to the slownode and it took a very 
long time to finish writing for a normal file.

After we checked the metrics, we found we can avoid the pipeline creating on 
the slownodes with 
"dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled" set to true. 
It will work fine for new clients, but for clients already using the slownode 
as pipeline, they have to suffer the slownode. (Maybe the slownode is reported 
by this pipeline.)

Since when clients are writing data, it will only be clients and datanodes 
communicating, so even NameNode has the information that the datanode in the 
pipeline is slow, clients can do too much to avoid it. Our proposal would be, 
to let Datanodes get the information from heartbeats reports, then during the 
writing, datanodes can report it to clients, then clients can choose to rebuild 
the pipeline to improve the writing performance.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443268#comment-17443268
 ] 

Xiaoqiao He commented on HDFS-16320:


Thanks [~Symious] for your report and patch. IMO it is a little tricky for 
DataNode to get the slownode status from NameNode. In theory, DataNode has the 
total information to decide if it is slow by itself rather than following 
NameNode command. Would you mind to offer some more information about your plan 
to using this status? Thanks.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org