[jira] Commented: (HADOOP-6502) DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300

Hairong Kuang (JIRA) Mon, 25 Jan 2010 11:57:59 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804678#action_12804678
 ]


Hairong Kuang commented on HADOOP-6502:
---------------------------------------

> but I'm not sure we need to anything here at all... 
>From the dfs point of view, of course, this should be fixed. It seems so weird 
>that map/reduce related stuff would effect the performance of hdfs.

+1 caching negatives. This should in general improve newInstance performance in 
the failure case. 

> DistributedFileSystem#listStatus is very slow when listing a directory with a 
> size of 1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0, 0.22.0
>
>
> When listing a directory of around 1300 children, it takes hundreds of 
> milliseconds. It turns out the slowdowness is caused by the change made by 
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When 
> deserializing each element of the array, 
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls 
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class 
> path by calling Configuration#getClassByName. Even though 
> Configuration#getClassByName tries to optimize the lookup using a cached map, 
> but since JobConf is not in the class path, so it is not in the cache. Every 
> checkup ends up calling Class.ForName which is very expensive. Deserializing 
> an array of 1300 entries requires calling of Class#ForName 1300 times!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6502) DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300

Reply via email to