[
https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658455#action_12658455
]
Devaraj Das commented on HADOOP-4927:
-------------------------------------
Okay, so i figured that I was referring to the old MapReduce API *smile*
There seems to be two approaches anyways. For the old API:
Today, the getRecordWriter calls relevant to the tasks are made in two places -
in DirectMapOutputCollector (in the constructor) and in ReduceTask.java (just
before starting to call the user's reduce method). We can probably move the
calls to the respective OutputCollect.collect implementations:
{code}
if (out == null) {
out = job.getOutputFormat().getRecordWriter(fs, job, finalName, reporter);
}
{code}
For the new API, I am not yet sure what the good approach is. Maybe we could
delay creating the recordwriter until TaskInputOutputContext.write is invoked.
The other approach is to delay the creation of the files on the output
filesystem, until it is necessary, in the respective RecordWriter
implementations. But this requires users (who have implemented recordwriters or
are implementing them in the future) to be aware of such a change and thus is
vulnerable to problems..
Thoughts?
> Part files on the output filesystem are created irrespective of whether the
> corresponding task has anything to write there
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4927
> URL: https://issues.apache.org/jira/browse/HADOOP-4927
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Devaraj Das
> Fix For: 0.20.0
>
>
> When OutputFormat.getRecordWriter is invoked, a part file is created on the
> output filesystem. But the created RecordWriter is not used until the
> OutputCollector.collect call is made by the task (user's code). This results
> in empty part files even if the OutputCollector.collect is never invoked by
> the corresponding tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.