[
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hairong Kuang updated MAPREDUCE-1981:
-------------------------------------
Attachment: mapredListFiles4.patch
This patch fixed two failed unit tests: TestCombineFileInputFormat and
TestHarFileSystem.
For the first test, I found out that my patch made a subtle change to
CombinFileInputFormat. The path filter in CombineFileInputFormat assumes that
the path to the filter does not include the schema and hostname etc. This is
different from other input formats where they do not have any assumption on
Path format. My patch removes this restriction so I have to modify
DummyFileInputFormat in TestCombineFileInputFormat not to have this assumption.
For the second test, it turns out HarFileSystem does not have a correct
implementation of listLocatedStatus. So this patch adds one.
> Improve getSplits performance by using listFiles, the new FileSystem API
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-1981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: mapredListFiles.patch, mapredListFiles1.patch,
> mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new
> API, thus reducing the number of RPCs to HDFS NameNode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.