[
https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hemanth Yamijala updated MAPREDUCE-1466:
----------------------------------------
Attachment: MAPREDUCE-1466_yhadoop20-1.patch
Minor changes to the earlier patch in the newly attached one:
- Removed a System.err println in the old FileInputFormat. Please note that the
same data (about number of paths to process) is available via a log statement
in getSplits as well.
- Removed a duplicate call to listStatus in the new FileInputFormat, which was
like this:
{code}
+ List<FileStatus>files = listStatus(job);
for (FileStatus file: listStatus(job)) {
{code}
I also suppose we need testcases for the new API. However, there are no tests
for any of the classes in the org.apache.hadoop.mapreduce.lib.input package. So
possibly this should be a separate JIRA.
Please let me know if the changes seem fine.
> FileInputFormat should save #input-files in JobConf
> ---------------------------------------------------
>
> Key: MAPREDUCE-1466
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: client
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1466_yhadoop20-1.patch,
> MAPREDUCE-1466_yhadoop20.patch
>
>
> We already track the amount of data consumed by MR applications
> (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the
> client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy
> to stick in the JobConf during job-submission.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.