[
https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting updated HADOOP-4927:
---------------------------------
Issue Type: New Feature (was: Bug)
Changing this from a bug to a feature request. It seems reasonable for
FileOutputFormat to support a mode where files are created lazily when the
first record is written.
> Out of 30 million files/dirs, 4.5 million part- files were empty. 40 users
> having more than 10,000 empty files.
It sounds like there's also perhaps another problem here. Are these folks
perhaps specifying way too many reduces? For jobs with lots of empty files,
how many non-empty files are there, and how big are they?
> Part files on the output filesystem are created irrespective of whether the
> corresponding task has anything to write there
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4927
> URL: https://issues.apache.org/jira/browse/HADOOP-4927
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Devaraj Das
> Fix For: 0.20.0
>
>
> When OutputFormat.getRecordWriter is invoked, a part file is created on the
> output filesystem. But the created RecordWriter is not used until the
> OutputCollector.collect call is made by the task (user's code). This results
> in empty part files even if the OutputCollector.collect is never invoked by
> the corresponding tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.