[ https://issues.apache.org/jira/browse/MAPREDUCE-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
yixiaohua updated MAPREDUCE-278: -------------------------------- Attachment: TestUTF8AndStringGetBytes.java > Proposal for redesign/refactoring of the JobTracker and TaskTracker > ------------------------------------------------------------------- > > Key: MAPREDUCE-278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-278 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Arun C Murthy > Assignee: Sharad Agarwal > Attachments: Job_Tracker_FSM.pdf, TestUTF8AndStringGetBytes.java, > mapred_as_dfa.patch > > > During discussions on HADOOP-815 wrt some hard-to-maintain code on the > JobTracker we all agreed that the current state-of-affairs there is brittle > and merits some rework. > Case in point: there are back-calls from TaskInProgress to JobTracker and > from JobInProgress to JobTracker which mean that synchronization is quite > involved and brittle, leading to issues like HADOOP-600. Also one is forced > to lock several data-structures individually before certain operations > (taskTrackers, trackerExpiryQueue, jobs etc.) > Hence I'd like to present some early thoughts (which have undergone a quick > iteration) on how we could do slightly better by a bit of > redesign/refactoring, also during discussions with Owen on the same we agreed > that HADOOP-554 is an integral part along the same direction... and I also > feel that a good candidate to be done along with this is HADOOP-398 (mapred > package refactoring). > Context: > --------- > a) The unit of communication between the JobTracker & TaskTracker is a 'task'. > b) Due to (a) the JobTracker maintains a bunch of information related on the > 'taskid' i.e. taskidToTipMap, taskidToTrackerMap etc. and hence we need to > update the JobTracker's data-structures via back-calls from TaskInProgress & > JobInProgress where the context is available (complete/failed task, > already-completed task etc.) > c) This implies that we have a fairly elaborate and hard to maintain locking > structures and also some redundant information in the JobTracker; making it > harder to maintain. > Overall at both the JobTracker & TaskTracker the concept of a 'job' is > overshadowed by the 'task'; which I propose we fix. > Proposal: > ---------- > Here is the main flow of control: > JobTracker -> JobInProgress -> TaskInProgress -> task_attempt > The main idea is to break the existing nexus between the JobTracker & > TaskInProgress/taskid by (I've put code for illustrative purposes only, and > ignored pieces irrelevant to this discussion): > a) Making the 'job' the primary unit of communication between JobTracker & > TaskTracker. > b) TaskTrackerStatus now looks like this: > class TaskTrackerStatus { > List<JobStatus> jobStatuses; // the status of the 'jobs' running on a > TaskTracker > String getTrackerName(); > } > class JobStatus { > List<TaskStatus> taskStatuses; // the status of the 'tasks' belonging to > a job > JobId getJobId(); > } > c) The JobTracker maintains only a single map of jobid -> JobInProgress, and > mapping from taskTracker -> List<JobInProgress> > Map<JobId, JobInProgress> allJobs; > Map<String, List<JobInProgress>> trackerToJobsMap; > d) The JobTracker delegates a bunch of responsibilities to the JobInProgress > to reflect the fact the primary 'concept' in map/reduce is the 'job', thus > empowering the JobInProgress class: > class JobInProgress { > TaskInProgress[] mapTasks; > TaskInProgress[] reduceTasks; > > Map<String, List<TaskInProgress>> trackerToTasksMap; // tracker -> tasks > running > Map<String, List<TaskAttempt>> trackerToMarkedTasksMap; // tracker -> > completed (success/failed/killed) task-attempt, > > // but the tracker doesn't know it yet > void updateStatus(JobStatus jobStatus); > MapOutputLocation[] getMapOutputLocations(int[] mapTasksNeeded, int > reduce); > TaskAttempt getTaskToRun(String taskTracker); > List<TaskTrackerAction> getTaskToKill(String taskTracker); > } > > d) On receipt of TaskTrackerStatus from a tracker, the processeing of > heartbeat looks like this: > for (JobStatus jobStatus : taskTrackerStatus.getJobStatuses()) { > JobInProgress job = allJobs.get(jobId); > synchronized (job) { > job.updateStatus(jobStatus); > return (HeartbeatResponse(repsonseId, > job.getTaskAttemptToRun(trackerName), > job.getTaskToKill(trackerName) > )); > } > } > > The big change is that the JobTracker delegates a lot of responsibility to > the JobInProgress, we get away from all the complicated synchronization > constructs: simply lock the JobInProgress object at all places via > allJobs/trackerToJobsMap and we are done. This also enhances throughput since > mostly we will not need to lock up the JobTracker (even in the heartbeat > loop); locking the JobInProgress or the 2 maps is sufficient in most cases... > thus enhance the inherent parallelism of the JobTracker's inner loop > (processing heartbeat) and provide better response when multiple jobs are > running on the cluster. > Hence the JobInProgress is responsible for maintaining it's TaskInProgress'es > which in turn are completely responsible for the TaskAttempt`s, the > JobInProgress also provides sufficient information as and when needed to the > JobTracker to schedule jobs/tasks and the JobTracker is blissfully unaware of > the innards of jobs/tasks. > -*-*- > I hope to articulate more a general direction towards an improved and > maintainable 'mapred' and would love to hear out how we can improve and > pitfalls to avoid... lets discuss. We could take this piecemeal an implement > or at one go... > Last, not least; I propose that while we are at this we redo the nomenclature > a bit: > JobInProgress -> Job > TaskInProgress -> Task > taskid -> replace with a new TaskAttempt > this should help clarify each class and it's roles. > Of course we will probably need a separate org.apache.hadoop.mapred.job.Task > v/s org.apache.hadoop.mapred.task.Task which is why I feel HADOOP-554 > (refactoring of mapred packages) would be very important to get a complete, > coherent solution. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira