[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058203#comment-13058203
 ] 

Todd Lipcon commented on MAPREDUCE-2634:
----------------------------------------

Proposal #1 seems like an interested idea, but I'm skeptical that it will make 
a big difference, since we've already lowered the minimum heartbeat interval to 
300ms in MAPREDUCE-1906.
Proposal #2 seems scary since setup and cleanup may run user code, and running 
user code in the JobTracker JVM is insecure. Piggybacking those with other map 
tasks, though, is probably a good idea (for some reason I don't think we do 
this with JVM reuse today)
Your proposal #3 and #4 is already implemented by MAPREDUCE-270 if I understand 
you correctly.

> MapReduce Performance Improvements using forced heartbeat 
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2634
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Abhijit Suresh Shingate
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Following are the proposals which would cause some performance optimizations 
> over MapReduce
> *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
>   a) Presently when new Job is submitted to JobTracker, the tasks are 
> assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
> JobTracker
>   b) Proposal:
>         - JobTracker will notify all TaskTrackers to send heartbeat to 
> JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
> of the new Job can be immediately assigned to all TaskTrackers. 
> *2. Execute Job Setup and Cleanup on JobTracker JVM*
>   a) Presently Job Setup and Cleanup is carried out as a separated task on 
> TaskTracker
>   b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
> amount of overhead. It takes generally about 0.7 - 1.5 seconds.
>   c) Proposal:
>         - JobTracker will execute the Job Setup and Cleanup tasks on the 
> JobTracker JVM only.
> *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
>   a) Presently TaskTracker reports status of completed Map Tasks as part of 
> heartbeat at a regular interval.
>   b) Proposal:
>         - Map Task requests TaskTracker to send heartbeat to JobTracker when 
> Map Task is completed. So that Reduce task can quickly know which map task is 
> finished and copy map outputs to local.
> *4. Request JobTracker to trigger committing of Reduce output when Reduce 
> Task has finished.*
>   a) Presently JobTracker will ask the Reduce Task to commit its output to 
> HDFS through heartbeat response.
>   b) Proposal:
>         - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
> whenever Reduce Task is completed.
> These optimizations might work on small clusters but on big clusters it may 
> be overhead.
> Please let us know your views.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to