Re: Mapred job parallelism
Indeed, you will need to enable the Fair Scheduler or Capacity Scheduler (which are both in 0.19) to do this. mapred.map.tasks is more a hint than anything else -- if you have more files to map than you set this value to, it will use more tasks than you configured the job to. The newer schedulers will ensure that each job's many map tasks are only using a portion of the available slots. - Aaron On Mon, Jan 26, 2009 at 1:43 PM, jason hadoop wrote: > I believe that the schedule code in 0.19.0 has a framework for this, but I > haven't dug into it in detail yet. > > http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html > > From what I gather you would set up 2 queues, each with guaranteed access > to > 1/2 of the cluster > Then you submit your jobs to alternate queues. > > This is not ideal as you have to balance what queue you submit jobs to, to > ensure that there is some depth. > > > On Mon, Jan 26, 2009 at 1:30 PM, Sagar Naik wrote: > > > Hi Guys, > > > > I was trying to setup a cluster so that two jobs can run simultaneously. > > > > The conf : > > number of nodes : 4(say) > > mapred.tasktracker.map.tasks.maximum=2 > > > > > > and in the joblClient > > mapred.map.tasks=4 (# of nodes) > > > > > > I also have a condition, that each job should have only one map-task per > > node > > > > In short, created 8 map slots and set the number of mappers to 4. > > So now, we have two jobs running simultaneously > > > > However, I realized that, if a tasktracker happens to die, potentially, I > > will have 2 map-tasks running on a node > > > > > > Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no > effect. > > It is tasktracker property and cant be changed per job > > > > Any ideas on how to have 2 jobs running simultaneously ? > > > > > > -Sagar > > > > > > > > > > > > > > >
Re: Mapred job parallelism
I believe that the schedule code in 0.19.0 has a framework for this, but I haven't dug into it in detail yet. http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html >From what I gather you would set up 2 queues, each with guaranteed access to 1/2 of the cluster Then you submit your jobs to alternate queues. This is not ideal as you have to balance what queue you submit jobs to, to ensure that there is some depth. On Mon, Jan 26, 2009 at 1:30 PM, Sagar Naik wrote: > Hi Guys, > > I was trying to setup a cluster so that two jobs can run simultaneously. > > The conf : > number of nodes : 4(say) > mapred.tasktracker.map.tasks.maximum=2 > > > and in the joblClient > mapred.map.tasks=4 (# of nodes) > > > I also have a condition, that each job should have only one map-task per > node > > In short, created 8 map slots and set the number of mappers to 4. > So now, we have two jobs running simultaneously > > However, I realized that, if a tasktracker happens to die, potentially, I > will have 2 map-tasks running on a node > > > Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no effect. > It is tasktracker property and cant be changed per job > > Any ideas on how to have 2 jobs running simultaneously ? > > > -Sagar > > > > > > >
Mapred job parallelism
Hi Guys, I was trying to setup a cluster so that two jobs can run simultaneously. The conf : number of nodes : 4(say) mapred.tasktracker.map.tasks.maximum=2 and in the joblClient mapred.map.tasks=4 (# of nodes) I also have a condition, that each job should have only one map-task per node In short, created 8 map slots and set the number of mappers to 4. So now, we have two jobs running simultaneously However, I realized that, if a tasktracker happens to die, potentially, I will have 2 map-tasks running on a node Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no effect. It is tasktracker property and cant be changed per job Any ideas on how to have 2 jobs running simultaneously ? -Sagar