[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954497#comment-16954497
 ] 

Jinglun commented on HDFS-14908:
--------------------------------

I did a benchmark and here is the result. Test file is Test.java.

 

*Design*

Compare path.startsWith(parent) and DFSUtil.isParent(path, parent) when path 
starts / not starts with parent.
{quote}java -Xmx512m Test 10000000000 /dir0/dir1/dir2/dir3 /dir0/dir1

java -Xmx512m Test 10000000000 /dir0/dir1/dir2/dir3 /dir0/dir2

java -Xmx512m Test 10000000000 dir0/dir1/dir2/dir3 dir0/dir1

java -Xmx512m Test 10000000000 /dir0/dir1/dir2/dir3/ /dir0/dir1/
{quote}
*Result*

Case 1:

path=    /dir0/dir1/dir2/dir3

parent=/dir0/dir1
|| Times||1,000,000,000||10,000,000,000||
|startsWith()|4,095ms|40,657ms|
|isParent()|4,665ms|46,543ms|

Case 2:

path=    /dir0/dir1/dir2/dir3

parent=/dir0/dir2
||Times||1,000,000,000||10,000,000,000||
|startsWith()|4,029ms|39,559ms|
|isParent()|3,984ms|39,095ms|

Case 3:

path=    dir0/dir1/dir2/dir3

parent=dir0/dir1
||Times||1,000,000,000||10,000,000,000||
|startsWith()|3,519ms|34,562ms|
|isParent()|235ms|2,328ms|

Case 4:

path=    /dir0/dir1/dir2/dir3/

parent=/dir0/dir1/
||Times||1,000,000,000||10,000,000,000||
|startsWith()|4,546ms|45,355ms|
|isParent()|19,409ms|189,750ms|

*Conclusion*

The result in case 1 shows the overhead in isParent() is not very serious. I 
think it's acceptable.

The result in case 2 is a little confusing as isParent() is faster than 
startsWith(). I repeated the test many times but the results were the same. The 
method isParent() does many additional checks and then calls startsWith() so it 
should be slower than calling startsWith() directly.  May be the compiler/JVM 
has optimized the code ?

The result of Case 3 is expected because in isParent() the comparison of each 
characters is skipped.

The result of Case 4 shows isParent() costs much more time when path ends with 
a '/'. That's because there are copy of strings. We can optimize this by 
introducing a new characters comparing method supporting startIndex and 
endIndex to replace the startsWith() in L1769.

> LeaseManager should check parent-child relationship when filter open files.
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-14908
>                 URL: https://issues.apache.org/jira/browse/HDFS-14908
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.1
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Minor
>         Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to