[
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated MAPREDUCE-1981:
-------------------------------------
Status: Open (was: Patch Available)
I'm pleased to see this feature propagate to MR. The approach looks correct,
just a few comments:
* It looks like this change:
{noformat}
- return result.toArray(new FileStatus[result.size()]);
+ return result.toArray(new LocatedFileStatus[result.size()]);
{noformat}
Causes {{TestMapRed}} to fail. {{SequenceFileInputFormat}} (and, presumably,
other supertypes of {{FileInputFormat}}) may rely on the type of the array
returned from {{FileInputFormat}} to be {{FileStatus[]}}
* I think the HDFS fault injection is breaking the publishing of that artifact,
so the mapred tests currently do not recognize the change to the HDFS
ClientProtocol and {{TestSubmitJob}} fails to compile. However, the patch is
current with HDFS trunk and disabling the fault injection before running
mvn-install, etc. works. Is this fault being tracked in HDFS?
* The patch causes {{TestNoDefaultsJobConf}} to fail:
{noformat}
Testcase: testNoDefaults took 4.489 sec
Caused an ERROR
No AbstractFileSystem for scheme: hdfs
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for
scheme: hdfs
at
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:143)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:198)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:394)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:409)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:188)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:461)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:453)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:354)
at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1037)
at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1034)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1030)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1034)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:536)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:781)
at
org.apache.hadoop.conf.TestNoDefaultsJobConf.testNoDefaults(TestNoDefaultsJobConf.java:83)
{noformat}
* Unfortunately, {{FileInputFormat::addInputPathRecursively}} could be
overridden by a user. This should either be marked as an incompatible change or
the function should be deprecated, but its functionality preserved. It may also
be worth confirming that no test relies on it.
> Improve getSplits performance by using listFiles, the new FileSystem API
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-1981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: mapredListFiles.patch, mapredListFiles1.patch,
> mapredListFiles2.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new
> API, thus reducing the number of RPCs to HDFS NameNode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.