[ https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Liang updated HDFS-11535: ------------------------------ Attachment: HDFS-11535.004.patch Thanks [~arpitagarwal] for the comments! Post v004 patch with a number of style updates. > Performance analysis of new DFSNetworkTopology#chooseRandom > ----------------------------------------------------------- > > Key: HDFS-11535 > URL: https://issues.apache.org/jira/browse/HDFS-11535 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: Chen Liang > Assignee: Chen Liang > Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, > HDFS-11535.003.patch, HDFS-11535.004.patch, PerfTest.pdf > > > This JIRA is created to post the results of some performance experiments we > did. For those who are interested, please the attached .pdf file for more > detail. The attached patch file includes the experiment code we ran. > The key insights we got from these tests is that: although *the new method > outperforms the current one in most cases*. There is still *one case where > the current one is better*. Which is when there is only one storage type in > the cluster, and we also always look for this storage type. In this case, it > is simply a waste of time to perform storage-type-based pruning, blindly > picking up a random node (current methods) would suffice. > Therefore, based on the analysis, we propose to use a *combination of both > the old and the new methods*: > say, we search for a node of type X, since now inner node all keep storage > type info, we can *just check root node to see if X is the only type it has*. > If yes, blindly picking a random leaf will work, so we simply call the old > method, otherwise we call the new method. > There is still at least one missing piece in this performance test, which is > garbage collection. The new method does a few more object creation when doing > the search, which adds overhead to GC. I'm still thinking of any potential > optimization but this seems tricky, also I'm not sure whether this > optimization worth doing at all. Please feel free to leave any > comments/suggestions. > Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org