[ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886103#action_12886103
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1140:
----------------------------------------------

> I believe this because org.apache.hadoop.hdfs.util.GSet links java.util.Map 
> and Set in its javaDocs.  ...

Konstantin, are you sure?  The GSet patch was committed in May and there is no 
javadoc warning for quite a few Hudson builds, e.g. 
[this|https://issues.apache.org/jira/browse/HDFS-1258?focusedCommentId=12883441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12883441],
 
[this|https://issues.apache.org/jira/browse/HDFS-1093?focusedCommentId=12884884&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12884884]
 and 
[this|https://issues.apache.org/jira/browse/HDFS-1093?focusedCommentId=12884883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12884883].

> Speedup INode.getPathComponents
> -------------------------------
>
>                 Key: HDFS-1140
>                 URL: https://issues.apache.org/jira/browse/HDFS-1140
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
> HDFS-1140.patch
>
>
> When the namenode is loading the image there is a significant amount of time 
> being spent in the DFSUtil.string2Bytes. We have a very specific workload 
> here. The path that namenode does getPathComponents for shares N - 1 
> component with the previous path this method was called for (assuming current 
> path has N components).
> Hence we can improve the image load time by caching the result of previous 
> conversion.
> We thought of using some simple LRU cache for components, but the reality is, 
> String.getBytes gets optimized during runtime and LRU cache doesn't perform 
> as well, however using just the latest path components and their translation 
> to bytes in two arrays gives quite a performance boost.
> I could get another 20% off of the time to load the image on our cluster (30 
> seconds vs 24) and I wrote a simple benchmark that tests performance with and 
> without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to