[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

Abhijit Suresh Shingate (JIRA) Thu, 30 Jun 2011 20:07:05 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abhijit Suresh Shingate updated MAPREDUCE-2634:
-----------------------------------------------

    Description: 
Following are the proposals which would cause some performance optimizations 
over MapReduce

*1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
        - JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

*2. Execute Job Setup and Cleanup on JobTracker JVM*
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
        - JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
*3. Request TaskTracker to send heartbeat when the Map Task is completed.*
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
        - Map Task requests TaskTracker to send heartbeat to JobTracker when 
Map Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
*4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. *
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
        - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.


  was:
Following are the proposals which would cause some performance optimizations 
over MapReduce

1.Notify TaskTracker to send heartbeat  when a new Job is submitted
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
    (1). JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

2. Execute Job Setup and Cleanup on JobTracker JVM
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
    (1). JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
3. Request TaskTracker to send heartbeat when the Map Task is completed.
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
   (1). Map Task requests TaskTracker to send heartbeat to JobTracker when Map 
Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. 
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
   (1). Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.



> MapReduce Performance Improvements using forced heartbeat 
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2634
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Abhijit Suresh Shingate
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Following are the proposals which would cause some performance optimizations 
> over MapReduce
> *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
>   a) Presently when new Job is submitted to JobTracker, the tasks are 
> assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
> JobTracker
>   b) Proposal:
>         - JobTracker will notify all TaskTrackers to send heartbeat to 
> JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
> of the new Job can be immediately assigned to all TaskTrackers. 
> *2. Execute Job Setup and Cleanup on JobTracker JVM*
>   a) Presently Job Setup and Cleanup is carried out as a separated task on 
> TaskTracker
>   b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
> amount of overhead. It takes generally about 0.7 - 1.5 seconds.
>   c) Proposal:
>         - JobTracker will execute the Job Setup and Cleanup tasks on the 
> JobTracker JVM only.
> *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
>   a) Presently TaskTracker reports status of completed Map Tasks as part of 
> heartbeat at a regular interval.
>   b) Proposal:
>         - Map Task requests TaskTracker to send heartbeat to JobTracker when 
> Map Task is completed. So that Reduce task can quickly know which map task is 
> finished and copy map outputs to local.
> *4. Request JobTracker to trigger committing of Reduce output when Reduce 
> Task has finished. *
>   a) Presently JobTracker will ask the Reduce Task to commit its output to 
> HDFS through heartbeat response.
>   b) Proposal:
>         - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
> whenever Reduce Task is completed.
> These optimizations might work on small clusters but on big clusters it may 
> be overhead.
> Please let us know your views.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

Reply via email to