[ https://issues.apache.org/jira/browse/HADOOP-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur reopened HADOOP-2158: -------------------------------------- Upload patch into the 0.15 branch as well. > hdfsListDirectory in libhdfs does not scale > ------------------------------------------- > > Key: HADOOP-2158 > URL: https://issues.apache.org/jira/browse/HADOOP-2158 > Project: Hadoop > Issue Type: Bug > Components: libhdfs > Affects Versions: 0.15.0 > Reporter: Christian Kunz > Assignee: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > Attachments: 2158.patch > > > hdfsListDirectory makes one rpc call using deprecated > fs.FileSystem.listPaths, and then two rpc calls for every entry in the > returned array. When running a job with more than 3000 mappers each running a > pipes application using libhdfs to scan a dfs directory with about 100-200 > entries, this results in about 1M rpc calls to the namenode server > overwhelming it. > hdfsListDirectory should call fs.FileSystem.listStatus instead. > I will submit a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.