[ https://issues.apache.org/jira/browse/HADOOP-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558749#action_12558749 ]
Runping Qi commented on HADOOP-2491: ------------------------------------ A great analysis. +2 Especially like the concept of task scheduler. Task scheduler should decide which task runs on which node based on the following information: 1. which job the task belongs to 2. task type (mapper/reducer/...) 3. the input data for the task (data locality) 4. the output data size (temp space requirement) 5. the load of the node (current and historical data) 6. the capacity of the node (enough temp disk space?) > generalize the TT / JT servers to handle more generic tasks > ----------------------------------------------------------- > > Key: HADOOP-2491 > URL: https://issues.apache.org/jira/browse/HADOOP-2491 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: eric baldeschwieler > > We've been discussing a proposal to generalize the TT / JT servers to handle > more generic tasks and move job specific work out of the job tracker and into > client code so the whole system is both much more general and has more > coherent layering. The result would look more like condor/pbs like systems > (or presumably borg) with map-reduce as a user job. > Such a system would allow the current map-reduce code to coexist with other > work-queuing libraries or maybe even persistent services on the same Hadoop > cluster, although that would be a stretch goal. We'll kick off a thread with > some documents soon. > Our primary goal in going this way would be to get better utilization out of > map-reduce clusters and support a richer scheduling model. The ability to > support alternative job frameworks would just be gravy! > ---- > Putting this in as a place holder. Hope to get folks talking about this to > post some more detail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.