[ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886103#action_12886103 ]
Tsz Wo (Nicholas), SZE commented on HDFS-1140: ---------------------------------------------- > I believe this because org.apache.hadoop.hdfs.util.GSet links java.util.Map > and Set in its javaDocs. ... Konstantin, are you sure? The GSet patch was committed in May and there is no javadoc warning for quite a few Hudson builds, e.g. [this|https://issues.apache.org/jira/browse/HDFS-1258?focusedCommentId=12883441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12883441], [this|https://issues.apache.org/jira/browse/HDFS-1093?focusedCommentId=12884884&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12884884] and [this|https://issues.apache.org/jira/browse/HDFS-1093?focusedCommentId=12884883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12884883]. > Speedup INode.getPathComponents > ------------------------------- > > Key: HDFS-1140 > URL: https://issues.apache.org/jira/browse/HDFS-1140 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.22.0 > Reporter: Dmytro Molkov > Assignee: Dmytro Molkov > Priority: Minor > Fix For: 0.22.0 > > Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, > HDFS-1140.patch > > > When the namenode is loading the image there is a significant amount of time > being spent in the DFSUtil.string2Bytes. We have a very specific workload > here. The path that namenode does getPathComponents for shares N - 1 > component with the previous path this method was called for (assuming current > path has N components). > Hence we can improve the image load time by caching the result of previous > conversion. > We thought of using some simple LRU cache for components, but the reality is, > String.getBytes gets optimized during runtime and LRU cache doesn't perform > as well, however using just the latest path components and their translation > to bytes in two arrays gives quite a performance boost. > I could get another 20% off of the time to load the image on our cluster (30 > seconds vs 24) and I wrote a simple benchmark that tests performance with and > without caching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.