[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated MAPREDUCE-1981:
-------------------------------------

    Attachment: mapredListFiles4.patch

This patch fixed two failed unit tests: TestCombineFileInputFormat and 
TestHarFileSystem.

For the first test, I found out that my patch made a subtle change to 
CombinFileInputFormat. The path filter in CombineFileInputFormat assumes that 
the path to the filter does not include the schema and hostname etc. This is 
different from other input formats where they do not have any assumption on 
Path format. My patch removes this restriction so I have to modify 
DummyFileInputFormat in TestCombineFileInputFormat not to have this assumption.

For the second test, it turns out HarFileSystem does not have a correct 
implementation of listLocatedStatus. So this patch adds one.

> Improve getSplits performance by using listFiles, the new FileSystem API
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1981
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>         Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
> mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new 
> API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to