[ https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804052#action_12804052 ]
Steve Loughran commented on HADOOP-6502: ---------------------------------------- you could always just cache the special case of JobConf not found, a single flag rather than adding special not-found entries to the weak<name, class> hashmap. Then it becomes something that can be pulled when jobconf goes away, rather than another feature that needs to be retained forever because of the risk of other code relying on it. > DistributedFileSystem#listStatus is very slow when listing a directory with a > size of 1300 > ------------------------------------------------------------------------------------------ > > Key: HADOOP-6502 > URL: https://issues.apache.org/jira/browse/HADOOP-6502 > Project: Hadoop Common > Issue Type: Bug > Components: util > Affects Versions: 0.20.0 > Reporter: Hairong Kuang > Priority: Critical > Fix For: 0.20.2, 0.21.0, 0.22.0 > > > When listing a directory of around 1300 children, it takes hundreds of > milliseconds. It turns out the slowdowness is caused by the change made by > HADOOP-4187. The return value of listStatus is an array of FileStatus. When > deserializing each element of the array, > ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls > setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class > path by calling Configuration#getClassByName. Even though > Configuration#getClassByName tries to optimize the lookup using a cached map, > but since JobConf is not in the class path, so it is not in the cache. Every > checkup ends up calling Class.ForName which is very expensive. Deserializing > an array of 1300 entries requires calling of Class#ForName 1300 times! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.