[
https://issues.apache.org/jira/browse/HADOOP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500489
]
Owen O'Malley commented on HADOOP-1201:
---------------------------------------
But once you have made the progress a separate thread, the ping provides little
value. The interface will look like:
{code}
boolean updateState(String taskid, int progressCount, float progress, String
state, TaskStatus.Phase phase, Counters count);
{code}
I don't see the point of having two threads that are both calling upto the task
tracker every second, especially since the ping thread is so trivial.
> Progress reporting can be improved for both Map/Reduce tasks
> ------------------------------------------------------------
>
> Key: HADOOP-1201
> URL: https://issues.apache.org/jira/browse/HADOOP-1201
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
>
> Both the map and reduce tasks do progress reporting in separate threads.
> However, in the ReduceTask, after the sort phase, the progress reporting
> happens inline with the reducer invocations. This slows down the Reduce phase
> since RPC is involved for every progress report. The better thing to do would
> be to do progress reporting for all phases in separate threads and have the
> tasks just update the progress fields.
> One proposal is to extract out the reporting stuff that is there in
> MapTask/ReduceTask and put it in the Task superclass as a new class, and have
> methods in the new class that control what/when progress is reported.
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.