[ https://issues.apache.org/jira/browse/HADOOP-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-2158: ---------------------------------- Status: Patch Available (was: Open) Promoting this for Christian. > hdfsListDirectory in libhdfs does not scale > ------------------------------------------- > > Key: HADOOP-2158 > URL: https://issues.apache.org/jira/browse/HADOOP-2158 > Project: Hadoop > Issue Type: Bug > Components: libhdfs > Affects Versions: 0.15.0 > Reporter: Christian Kunz > Assignee: Christian Kunz > Priority: Blocker > Fix For: 0.15.2 > > Attachments: 2158.patch > > > hdfsListDirectory makes one rpc call using deprecated > fs.FileSystem.listPaths, and then two rpc calls for every entry in the > returned array. When running a job with more than 3000 mappers each running a > pipes application using libhdfs to scan a dfs directory with about 100-200 > entries, this results in about 1M rpc calls to the namenode server > overwhelming it. > hdfsListDirectory should call fs.FileSystem.listStatus instead. > I will submit a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.