[ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867364#action_12867364
 ] 

Dmytro Molkov commented on HDFS-1140:
-------------------------------------

@Hairong. Well, I agree with you that conversion to String now is currently 
unnecessary. I guess I was trying to make an argument that potentially the 
format of the path in the image and the format of the path in the memory can be 
different, if someone changes it. In that case having a String representation 
in the middle might simplify things.
Anyway, since currently the byte representation is the same it does make sense 
to operate on the byte arrays right from the start.
Please see the patch attached. It doesn't convert the read bytes to string and 
introduces a codepath to insert a node based on the byte[][] representation 
array right from the start. Let me know if you have further comments.

> Speedup INode.getPathComponents
> -------------------------------
>
>                 Key: HDFS-1140
>                 URL: https://issues.apache.org/jira/browse/HDFS-1140
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1140.2.patch, HDFS-1140.patch
>
>
> When the namenode is loading the image there is a significant amount of time 
> being spent in the DFSUtil.string2Bytes. We have a very specific workload 
> here. The path that namenode does getPathComponents for shares N - 1 
> component with the previous path this method was called for (assuming current 
> path has N components).
> Hence we can improve the image load time by caching the result of previous 
> conversion.
> We thought of using some simple LRU cache for components, but the reality is, 
> String.getBytes gets optimized during runtime and LRU cache doesn't perform 
> as well, however using just the latest path components and their translation 
> to bytes in two arrays gives quite a performance boost.
> I could get another 20% off of the time to load the image on our cluster (30 
> seconds vs 24) and I wrote a simple benchmark that tests performance with and 
> without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to