[ https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415781#comment-17415781 ]
Aihua Xu commented on HDFS-16200: --------------------------------- [~hexiaoqiao] Thanks for checking. Regarding improving topology resolution performance, there is TableMapping with precomputed topology info but you need to know the list of the hosts and precompute the topology. We can convert the script into a build-in implementation, but I believe we will still hit some slowness there. For our particular case, we don't colocate storage with computing and the failover has been improved from over 10 minutes to just seconds by disabling it. Right now there are more cases to separate storage and computing. Should we have a global configuration to optimize for those cases? > Improve NameNode failover > ------------------------- > > Key: HDFS-16200 > URL: https://issues.apache.org/jira/browse/HDFS-16200 > Project: Hadoop HDFS > Issue Type: Task > Components: namanode > Affects Versions: 2.8.2 > Reporter: Aihua Xu > Assignee: Aihua Xu > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In a busy cluster, we are noticing the NameNode failover takes longer time > (over 10 minutes) and it causes cluster down time during the time period. > One bottleneck locates in resolving the client host's topology when the > cluster is not colocated with the computing hosts. NameNode resolves the > client host's topology and uses it to sort the hosts where the blocks locate > in. Such topology will be cached so the next access will be efficient, while > if the standby NameNode is newly restarted, then all the client hosts, e.g., > YARN hosts need to be resolved. > Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache, > or 2) we can add a new configuration in HDFS cluster to skip resolving > topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts > are not colocated, it's unnecessary to sort the DataNodes for the clients. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org