[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode
[ https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443351#comment-17443351 ] Janus Chow commented on HDFS-16320: --- {quote}I mean that DataNode has the total information to decide if he is SLOW based on response time or throughput rather than based on command from NameNode. Furthermore there is possible to false positive at NameNode side. {quote} In my opinion, the slownode information of NameNode is kind of union choices from DataNodes. All the slownodes are reported by other DataNodes(calculated by statistics), and NameNode does the summary and chooses the top reported DataNodes. Till this part, the data of "slownode" should be much confident. {quote}I am not against the idea but we should have more proper way to solve this problem. {quote} I tried to find other ways to find out the slownode, especially from DataNode themselves. But after I checked the implementation of "OutlierDetector.java" and "DataNodePeerMetrics.java", I think the current calculation is very good to spot the slownode. {quote}client and DataNode/Pipeline communication could estimate if there are slow nodes and which one is slow {quote} Since in a pipeline, the client only talks to the first DataNode, it could be difficult to track the slowness for the slowness between the three DataNodes. I think that's why the slownode is only calculated on the penultimate node and the last node. Another thing is in this ticket, it's kind of a slowness statement. Until now, the DataNode only shows the state of slowness tagged by each NameNode in the metrics. It's a kind of real-time status updated by heartbeat. > Datanode retrieve slownode information from NameNode > > > Key: HDFS-16320 > URL: https://issues.apache.org/jira/browse/HDFS-16320 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The current information of slownode is reported by reportingNode, and stored > in NameNode. > This ticket is to let the slownode retrieve the information from NameNode, so > that it can do other performance improvement actions based on this > information. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode
[ https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443283#comment-17443283 ] Xiaoqiao He commented on HDFS-16320: Thanks [~Symious] for your quick response. It is reasonable case for me. I mean that DataNode has the total information to decide if he is SLOW based on response time or throughput rather than based on command from NameNode. Furthermore there is possible to false positive at NameNode side. I am not against the idea but we should have more proper way to solve this problem. IMO, client and DataNode/Pipeline communication could estimate if there are slow nodes and which one is slow. FYI. Thanks. > Datanode retrieve slownode information from NameNode > > > Key: HDFS-16320 > URL: https://issues.apache.org/jira/browse/HDFS-16320 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The current information of slownode is reported by reportingNode, and stored > in NameNode. > This ticket is to let the slownode retrieve the information from NameNode, so > that it can do other performance improvement actions based on this > information. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode
[ https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443276#comment-17443276 ] Janus Chow commented on HDFS-16320: --- [~hexiaoqiao] Thank you for the review. The issue we met is we have clients writing to the slownode and it took a very long time to finish writing for a normal file. After we checked the metrics, we found we can avoid the pipeline creating on the slownodes with "dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled" set to true. It will work fine for new clients, but for clients already using the slownode as pipeline, they have to suffer the slownode. (Maybe the slownode is reported by this pipeline.) Since when clients are writing data, it will only be clients and datanodes communicating, so even NameNode has the information that the datanode in the pipeline is slow, clients can do too much to avoid it. Our proposal would be, to let Datanodes get the information from heartbeats reports, then during the writing, datanodes can report it to clients, then clients can choose to rebuild the pipeline to improve the writing performance. > Datanode retrieve slownode information from NameNode > > > Key: HDFS-16320 > URL: https://issues.apache.org/jira/browse/HDFS-16320 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The current information of slownode is reported by reportingNode, and stored > in NameNode. > This ticket is to let the slownode retrieve the information from NameNode, so > that it can do other performance improvement actions based on this > information. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode
[ https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443268#comment-17443268 ] Xiaoqiao He commented on HDFS-16320: Thanks [~Symious] for your report and patch. IMO it is a little tricky for DataNode to get the slownode status from NameNode. In theory, DataNode has the total information to decide if it is slow by itself rather than following NameNode command. Would you mind to offer some more information about your plan to using this status? Thanks. > Datanode retrieve slownode information from NameNode > > > Key: HDFS-16320 > URL: https://issues.apache.org/jira/browse/HDFS-16320 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The current information of slownode is reported by reportingNode, and stored > in NameNode. > This ticket is to let the slownode retrieve the information from NameNode, so > that it can do other performance improvement actions based on this > information. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org