[ 
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208638#comment-13208638
 ] 

Hudson commented on HADOOP-6502:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #1729 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1729/])
    HADOOP-6502. Improve the performance of Configuration.getClassByName when 
the class is not found by caching negative results. Contributed by Sharad 
Agarwal and Todd Lipcon. (Revision 1244620)

     Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1244620
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java

                
> DistributedFileSystem#listStatus is very slow when listing a directory with a 
> size of 1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Assignee: Sharad Agarwal
>            Priority: Critical
>             Fix For: 0.24.0, 0.23.2
>
>         Attachments: 6502.patch, 6502_v2.patch, hadoop-6502-trunk.txt, 
> hadoop-6502-trunk.txt
>
>
> When listing a directory of around 1300 children, it takes hundreds of 
> milliseconds. It turns out the slowdowness is caused by the change made by 
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When 
> deserializing each element of the array, 
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls 
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class 
> path by calling Configuration#getClassByName. Even though 
> Configuration#getClassByName tries to optimize the lookup using a cached map, 
> but since JobConf is not in the class path, so it is not in the cache. Every 
> checkup ends up calling Class.ForName which is very expensive. Deserializing 
> an array of 1300 entries requires calling of Class#ForName 1300 times!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to